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Abstract 


Queueing  network  models  have  proven  to  be  cost-ef f ective 
tools  for  evaluating  and  predicting  computer  system  performance. 
This  thesis  extends  both  the  accuracy  and  the  versatility  of 
these  models.  Their  accuracy  is  enhanced  by  an  improved  approach 
to  characterizing  distributions  of  service  times;  their 


versatility  by  an  accurate  techni 
distributions  of  response  times. 

I  first  consider  how  best  to 
of  service  times  in  closed  queuei 
distributions  of  the  amount  of  se 
single  visits  to  the  various  devi 
b<=>en  recognized  that  considering 
distributions  (typically  by  assum 
exponential)  may  lead  to  unaccept 
to  reduce  this  error  have  conside 
These  efforts  are,  in  fact,  misgu 
distribution  are  highly  influence 
queueing  network,  the  tails  of  se 
minor  role  in  determining  perform 
these  distributions  near  the  orig 

I  develop  a  rather  different 
service  time  distributions  of  cus 
considers  the  locations  of  certai 
distributions  (or,  alternatively, 
transforms  at  certain  points) .  I 
empirically  that  it  offers  a  sign 
over  traditional  methods,  and  pre 
these  results  in  selecting  parame 

Next,  I  address  the  problem  o 
of  response  times  in  queueing  net 
certain  other  performance  measure 
system  response  time  is  not  chara 


que  for  approximating 

characterize  the  distributions 
ng  network  models,  i.e. ,  the 
rvice  that  customers  require  on 
ces  in  the  system.  It  has  long 
only  the  means  of  these 
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ided.  The  moments  of  a 
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rvice  time  distributions  play  a 
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approach  to  characterizing  the 
toroers  in  these  models,  which 
n  percentiles  of  these 
the  values  of  their  Laplace 
demonstrate  both  formally  and 
ificant  improvement  in  accuracy 
sent  an  algorithm  that  exploits 
ter  values  for  Cox-type  servers. 

f  determining  the  distribution 
work  models.  In  contrast  to 
s  (e.g.,  device  utilization)  , 
cterized  well  by  its  mean  value. 


Yet  previous  attempts  to  derive  the  distribution  of  response 
times  analytically  have  been  confined  to  models  with  quite 
stringent  assumptions:  the  computer  system  is  represented  by  a 
single  service  center,  customers  are  statistically 
indistinguishable,  and  (with  a  few  exceptions)  service  times  are 
exponentially  distributed. 


I  derive  an  approximation  to  the  distribution  of  response 
times  in  queueing  network  models  with  several  customer  classes, 
under  quite  general  assumptions  regarding  the  amount  of  service 
required  by  the  customers.  The  approximation  displays  excellent 
accuracy  in  comparisons  with  simulation  models  and  an  existing 
system,  and  has  interesting  implications  concerning  the  role  of 
aggregation  in  modelling  complex  computer  systems. 

These  contributions  not  only  increase  our  ability  to  model 
computer  systems  accurately  and  to  derive  performance  measures  of 
interest,  but  also  increase  our  understanding  of  the 
characteristics  of  these  systems  that  are  essential  to 
determining  their  behaviour. 
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Prologue 


Experience  tells  us  that  describing  a  queueing  system  in 
terms  of  mean  values  alone,  ignoring  other  characteristics  of  the 
various  distributions  that  arise,  often  provides  an 
insufficiently  accurate  characterization.  Many  of  the  potential 
pitfalls  are  illustrated  by  a  simple  analogy:  Toronto's 
College/Carlton  streetcar  line. 


As  a  model, 
average,  thirty 
uniformly  about 


consider  a  loop  of  track  that  requires,  on  the 
minutes  to  traverse.  Distribute  six  streetcars 
the  loop. 


Our  first  observation,  an  irrefutable  statistic,  is  that  the  mean 
time  between  streetcars  is  five  minutes.  Now,  an  individual 
naive  both  in  queueing  theory  and  in  the  ways  of  streetcars  might 
reasonably  believe  that  the  expected  waiting  time  for  a 
streetcar,  the  mean  time  from  a  traveler's  arrival  at  a  stop 
until  a  streetcar  appears,  is  two  and  one-half  minutes.  We, 
however,  know  better. 

From  our  study  of  streetcar  science  we  know  that  despite  the 
various  measures  taken  to  regulate  their  progress,  streetcars  are 
seldom  equally  spaced.  In  fact,  the  system  is  inherently 
unstable.  Once  delayed,  whether  by  traffic  congestion  or  a  game 
at  Maple  Leaf  Gardens,  a  streetcar  tends  to  fall  increasingly  far 
behind  schedule  as  it  encounters  ever  greater  numbers  of 
passengers  waiting  to  climb  aboard,  each  oblivious  to  the 
presence  of  several  empty  streetcars  a  short  distance  behind. 
Furthermore,  we  note  that  even  if  the  six  streetcars  in  the  above 
model  were  coupled  to  one  another  so  that  they  arrive 
simultaneously,  the  mean  time  between  them  would  still  be  five 
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minutes.  This  is  but 
obviously  must  expect 
this  case! 


small  consolation  to  weary  travelers,  who 
to  wait  fifteen  minutes  for  a  streetcar  in 


From  our  study  of  queueing  theory  (more  properly,  renewal 
theory)  we  know  that  the  expected  waiting  time  from  a  random 
arrival  is  equal  to  the  mean  residual  lifetime  of  the 
distribution  of  times  between  streetcars.  The  concept  of 
residual  lifetime  formalizes  a  fact  of  life  that  streetcar 
travelers  attribute  simply  to  bad  luck:  in  the  presence  of 
variability  in  the  time  between  streetcars,  a  traveler  is  more 
likely  to  arrive  during  a  long  inte r- streetcar  interval  than 
during  a  short  one.  If  the  times  between  streetcars  are  constant 
and  identical,  the  expected  waiting  time  equals  one-half  the 
mean.  If  the  times  between  streetcars  are  exponentially 
distributed,  the  expected  waiting  time  equals  the  mean. 

(Kleinrock  refers  to  this  case  as  nthe  paradox  of  mean  residual 
life,"  since  at  first  glance  it  appears  that  successive 
streetcars  are  separated  by  twice  the  mean  time  between 
streetcars.)  The  example  of  coupled  streetcars  provides  a  bound 
on  performance  degradation. 


Queueing  theory  allows  us  to  formalize  another  of  the  canons 
of  streetcar  science:  even  if  the  expected  waiting  time  is 
known,  variability  in  waiting  times  still  makes  streetcar  travel 
unpredictable.  For  example,  in  order  to  be  ninety  percent 
certain  of  reaching  a  destination  by  a  given  time,  the  traveler 
must  know  the  distribution  of  waiting  times  (as  well  as  the 
distribution  of  travel  times)  in  more  detail  than  simply  the 
mean . 


There  are  t 
be  drawn  from  t 
the  parameters 
streetcars)  in 
determine  even 
waiting  time) . 
terms  of  means 
information  for 


wo  major  lessons  (and  many  minor  ones,  as  well)  to 
he  preceding  discussion.  First,  characterizing 
of  a  queueing  system  (e.g.,  the  time  between 
terms  of  means  alone  often  is  not  sufficient  to 
the  mean  of  a  performance  measure  (e.g.,  the 
Second,  characterizing  performance  measures  in 
alone  often  does  not  provide  sufficient 
users  of  the  system.  This  thesis  addresses  these 


two  related  issues  in  the  context  of  queueing  network  models  of 


computer  systems. 


■ 


-4- 


1.  The  Milieu 


In  many  senses,  a  modern  computer  system  is  the  epitome  of  an 
industrial  organization:  a  network  of  specialized,  partially 
autonomous  processes,  working  under  the  direction  of  an 
administrative  hierarchy  to  meet  a  set  of  shared  objectives. 

This  degree  of  specialization  and  autonomy,  though,  is  a  bane  as 
much  as  a  boon.  Hillier  and  Lieberman  [1974]  note  that  "as  the 
complexity  and  specialization  of  an  organization  increases,  it 
becomes  more  and  more  difficult  to  allocate  its  available 
resources  to  its  various  activities  in  a  way  that  is  most 
effective  for  the  organization  as  a  whole." 


It  is  not  surprising,  then,  that  computer  systems,  which  were 
in  large  measure  responsible  for  the  explosive  growth  of 
operations  research  during  the  1950‘s,  have  also  been  among  its 
chief  beneficiaries.  Increasingly,  designers  and  managers  of 
these  systems  have  asked  questions  that  techniques  such  as  data 
gathering  and  trend  extrapolation  are  powerless  to  answer,  and 
the  need  for  models  of  computer  systems  has  become  widely 
accepted . 

Analytic  modelling,  and  more  particularly  queueing  network 
modelling,  has  received  much  attention  recently  as  an  attractive 
alternative  to  simulation  modelling  for  answering  certain  of 
these  questions.  A  queueing  network  is  simply  a  network  of 
queues.  More  properly,  it  is  a  network  of  service  centers,  each 
consisting  of  a  queue  and  some  number  of  servers .  Customers 
circulate  among  the  service  centers  according  to  a  set  of  fixed 
transition  probabilities.  At  each  service  center,  the 
distribution  of  service  times  and  the  scheduling  discipline  are 
specified . 


Computer  systems  can  be  represented  at  various  levels  of 
detail  as  networks  of  queues.  Figure  1,  for  example,  is  a  high- 
level  model  of  an  interactive  system.  The  service  center  at  the 
bottom  represents  the  computer  system  itself;  the  service  centers 
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at  the  top  represent  interactive  terminals.  There  is  one 
terminal  dedicated  to  each  user  of  the  system,  so  queueing  does 
not  occur  in  the  terminal  subsystem. 


Figure  2  is  a  more  detailed  model  of  a  computer  system.  The 
service  center  to  the  left  represents  the  CPU;  the  service 
centers  to  the  right  represent  various  I/O  devices.  The  jobs  in 
the  model  interleave  periods  of  CPU  activity  with  bursts  of  I/O. 

Queueing  network  models  have  proven  to  be  cost-effective 
analytic  tools  in  a  wide  variety  of  computer  system  applications. 
They  have  been  used  successfully  to  study  the  effect  on 
performance  of  changes  in  workload  and  system  configuration  [ Rose 
1975],  to  evaluate  the  impact  of  various  decisions  during  the 
design  and  implementation  of  operating  systems  [Lassettre  6 
Scherr  1972],  to  compare  the  relative  cost-effectiveness  of 
alternative  proposals  for  upgrading  system  performance  [Unruh  & 
Sevcik  1975],  and  to  derive  optimality  results  that  furnish 
insights  into  the  complex  interactions  that  arise  in  computer 
systems  [Denning  et  al.  1  976  ]. 

These  and  other  successes  have  led  to  increasingly  demanding 
applications,  accompanied  by  increasingly  stringent  requirements 
on  the  accuracy  and  versatility  of  queueing  network  models.  To  a 
great  extent,  these  challenges  have  been  met;  Chapter  2  surveys 
the  recent  extraordinary  increase  in  the  complexity  of  the  models 
for  which  it  is  feasible  to  obtain  solutions. 
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Still,  problems  remain.  Two  of  the  most  pressing  concern  the 
way  in  which  distributional  information  is  dealt  with  in  queueing 
network  models.  The  first  is  the  accurate  characterization  of 
the  service  time  distributions  of  customers.  The  second  is  the 
need  to  predict  the  distribution  of  response  times.  It  is  to 
these  two  problems  that  this  thesis  is  addressed. 

Characterizing  Service  Time  Distributions 

The  earliest  efficient  queueing  network  solution  techniques 
(for  example,  see  [Buzen  1971])  were  restricted  to  models  in 
which  all  service  times  were  exponentially  distributed.  Since 
the  exponential  distribution  has  only  one  parameter,  it  is  unable 
to  represent  any  of  the  features  of  an  observed  service  time 
distribution  except  for  the  mean. 

Given  this  rather  severe  limitation,  simple  exponential 
queueing  network  models  have  demonstrated  a  remarkable  ability  to 
predict  certain  computer  system  performance  measures  accurately. 
Moore's  early  study  of  the  Michigan  Terminal  System  [1971]  and 
Giammo's  recent  analysis  of  a  large  government  data  processing 
installation  [1976]  are  but  two  of  the  many  examples. 
Occasionally,  though,  significant  errors  arise  in  using  these 
simple  networks  to  model  even  seemingly  straightforward  computer 
systems.  An  insufficiently  accurate  characterization  of  the  CPU 
service  time  distribution  is  often  a  major  cause  of  this  error. 

Recently  developed  solution  techniques  make  it 
computationally  feasible  to  use  Erlang's  method  of  stages 
(reviewed  in  Chapter  2)  to  achieve  greater  accuracy  in 
representing  service  time  distributions.  When  errors  of  the 
above  sort  arise,  modellers  inevitably  choose  to  represent  the 
CPU  with  a  two-stage  hyperexponential  server  that  matches  the 
mean  and  variance  of  the  observed  service  time  distribution.  One 
justification  for  this  choice  is  an  intuitively  appealing 
generalization  to  queueing  networks  of  the  Pollaczek-Khinchin 
mean  value  equation,  which  (in  conjunction  with  several  other 
results)  demonstrates  that  the  mean  performance  measures  of  an 
M/G/1  queueing  system  (a  single  server  queue  with  a  Poisson 
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arrival  process  and  a  general  service  time  distribution)  are 
fully  determined  by  the  mean  and  variance  of  the  service  time 
distribution  [Kleinrock  1975]. 

The  two-stage  hyperex ponent ia 1  is  a  three- para  me te r 
distribution.  Using  it  to  match  the  first  two  moments  of  an 
observed  distribution  results  not  in  a  single  set  of  parameters, 
but  in  a  family  of  parameter  values  constrained  by  the  equations 
for  mean  and  variance.  The  third  constraint  required  for  unique 
parameter  selection  is  chosen  arbitrarily,  typically  for 
algebraic  convenience  [Sauer  &  Chandy  1975],  The  departure  point 
for  this  study  is  the  surprising  observation  that  when  the 
various  members  of  such  a  family  are  used  in  a  queueing  network 
model  they  yield  dramatically  different  predictions  for  CPU 
utilization . 

This  thesis  demonstrates  that  the  intuition  obtained  from  the 
Pollaczek-Khinchi n  equation  for  the  M/G/1  queueing  system  is  not 
applicable  in  a  queueing  network  context.  Matching  the  second  or 
even  third  moment  of  an  observed  service  time  distribution  is  not 
only  insufficient,  but  also  unnecessary  and  frequently 
misleading.  Informally,  this  apparent  anomaly  results  from  the 
fact  that  while  the  moments  of  a  distribution  are  strongly 
influenced  by  its  tail,  the  tail  of  a  service  time  distribution 
does  not  play  an  important  role  in  systems  with  finite  customer 
populations. 

For  a  given  server  complexity,  significantly  greater  accuracy 
can  be  obtained  by  attempting  to  match  certain  values  of  the 
cumulative  distribution  function  of  the  observed  service  time 
distribution,  or  alternatively,  certain  values  of  its  Laplace 
transform.  I  describe  a  number  of  modelling  experiments  using  a 
procedure  that  selects  parameters  for  multiple- stage  servers 
based  upon  the  location  of  the  mean  and  selected  percentiles  of 
the  observed  distribution.  I  also  present  an  algorithm  that 
directly  matches  certain  Laplace  transform  values  of  an  observed 
service  time  distribution. 

These  results  constitute  an  improved  methodology  for 
representing  general  service  time  distributions  in  queueing 
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network  models  of  computer  systems.  Their  sound  basis  leads  to 
greater  confidence  in  the  predictive  ability  of  these  models. 


Determining  Response  Time  Distributions 


Queueing  network  models  have  demonstrated  their  usefulness  in 
assessing  the  impact  of  configuration  and  load  changes  on  system- 
oriented  performance  measures  such  as  mean  device  utilization. 
Increasingly,  though,  computer  systems  support  a  class  of 
customers  for  whom  response  time  is  the  performance  measure  of 
greatest  interest.  That  the  mean  response  time  provides  an 
inadequate  characterization  of  system  performance  is  perhaps  best 
illustrated  by  the  following  psychological  study  of  user 
satisfaction,  which  belongs  to  the  performance  evaluation 
folklore  [Metzger  1972]: 

The  terminal  handler  of  an  interactive  system  was  modified  so 
that  any  response  ready  before  a  fixed  amount  of  time  had  elapsed 
since  the  request  was  delayed  until  that  time  had  passed.  In 
doing  so,  the  mean  of  the  response  time  distribution  was 
increased,  but  its  variance  was  decreased.  Uniformly,  users 
perceived  a  performance  improvement  in  the  system. 


The  distribution  of  response  times  has  not  previously  yielded 
to  analysis  via  queueing  network  models.  Several  derivations 
have  been  carried  out  using  classical  finite  population,  single 
queue  models  to  represent  the  computer  system,  but  the  degree  to 
which  system  characteristics  must  be  aggregated  in  such  models 
makes  the  distributional  information  suspect,  at  best. 


These  analyses  {[Coffman  et  al.  1970  ],  [Muntz  1972]  and 
[Sekino  1972]  are  examples)  suffer  from  a  number  of  other 
shortcomings.  Without,  exception,  they  assume  that  all  customers 
are  statistically  identical.  For  the  most  part,  they  require 
that  the  distribution  of  service  times  be  exponential.  Finally, 
the  form  of  the  analytic  results  is  often  so  complex  that  no 
general  conclusions  can  be  drawn  concerning  the  behaviour  of  the 
system.  (The  recent  approximate  results  by  Wong,  [Wong  1977]  and 
[Wong  6  Muntz  1976],  are  notable  exceptions.) 
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This  thesis  reconsiders  the  determination  of  the  response 
time  distribution  in  central  server  queueing  networks,  using  a 
simple  Laplace  transform  analysis.  First,  an  approximation  to 
the  percentiles  of  the  response  time  distribution  for  a  customer 
whose  number  of  cycles  through  the  network  is  geometrically 
distributed  (the  usual  assumption  in  queueing  network  models)  is 
derived,  for  arbitrary  service  time  distributions  and  scheduling 
disciplines.  Next,  similar  results  are  obtained  when  the  number 
of  cycles  required  by  the  customer  is  known  precisely.  This 
approximation  involves  a  more  detailed  analysis  and  requires 
certain  restrictions  on  scheduling  disciplines  and  service  time 
distributions.  Finally,  these  two  results  are  combined  to  yield 
an  approximation  to  the  percentiles  of  the  response  time 
distribution  for  a  customer  with  an  arbitrarily  distributed  cycle 
requirement. 


The  results  have  an  extremely  simple  form,  but  display 
excellent  accuracy  in  comparisons  with  simulation  models  and  an 
existing  system.  They  provide  a  firm  basis  upon  which 
hierarchical  models  of  more  complex  computer  systems  can  be 
built . 


The  Organization  of  the  Thesis 

Before  describing  the  structure  of  the  remainder  of  the 
thesis,  it  is  appropriate  to  highlight  a  few  of  the  less  obvious 
themes  that  serve  to  unite  this  work: 

□  Each  analysis  is  based  upon  a  central  server  queueing  network 
model  in  which  the  CPU  has  a  general  service  time  distribution 
and  a  first-come- f irst- served  scheduling  discipline.  This  model 
is  more  complex  than  those  usually  encountered  in  analytic  work; 
it  does  not  possess  the  property  of  local  balance,  and  thus  is 
not  amenable  to  exact  solution  by  efficient  techniques.  This 
degree  of  realism  is  required  in  many  applications,  however,  and 
recent  advances  in  approximation  techniques  make  it  feasible  to 
d»al  with  models  of  this  complexity. 

n  A  common  set  of  analytic  techniques  is  used  throughout.  For 
instance,  the  fact  that  single-server  M/G/1  queues  can  be  used  to 
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represent  certain  closed  queueing  networks  is  exploited  in 
several  places,  and  an  analysis  in  terms  of  Laplace  transforms 
leads  to  each  result. 


□  Considered  together,  the  results  have  interesting 
implications  concerning  the  ability  of  aggregation  techniques 

[ Courtois  1977]  and  hierarchical  modelling  [Browne  et  al.  1975] 
to  accurately  predict  various  performance  measures  for  complex 
computer  systems. 

□  The  solutions  are  "engineer ing-oriented"  in  that  they 
significantly  increase  our  ability  to  model  computer  systems 
accurately,  derive  performance  measures  of  interest,  and  have 
confidence  in  the  predictive  ability  of  our  models. 

The  remainder  of  the  thesis  comprises  four  chapters: 

Chapter  2  introduces  a  number  of  the  techniques  that  will  be 
used  in  the  remainder  of  the  thesis.  Of  special  importance  are 
the  sections  dealing  with  the  method  of  stages,  with  queueing 
network  solution  techniques,  and  with  Laplace  transforms. 

Chapter  3  is  concerned  with  characterizing  service  time 
distributions  in  closed  queueing  network  models.  In  Chapter  4, 
approximations  to  the  distribution  of  response  times  in  these 
models  are  discussed.  These  chapters  are  largely  self-contained, 
and  each  concludes  with  a  summary  of  the  specific  results  that  it 
c  cnta ins. 


Chapter  5  attempts  to 
perspective,  and  examines 
results.  Some  directions 


place  the  work  of  the  thesis  in 
several  general  implications  of  the 
for  future  research  are  suggested. 


-11- 


2.  An  Introduction  to  the  Techniques  of  the  Thesis 


The  specific  contributions  of  this  thesis  concern  the 
accuracy  with  which  queueing  network  models  can  represent 
computer  systems,  and  the  range  of  performance  measures  that  can 
be  obtained  from  these  models.  The  thesis  builds  upon  the  large 
body  of  work  that  is  concerned  with  efficient  techniques  for  the 
solution  of  queueing  network  models.  The  purpose  of  this  chapter 
is  to  set  the  stage  for  the  results  to  follow,  by  describing  that 
work  and  by  introducing  those  analytic  techniques  that  are  of 
special  relevance. 


The  first  section  traces  the  path  that  leads  from  the  basic 
M/M/1  queueing  system  to  queueing  networks  composed  of 
exponential  service  centers.  The  global  balance  equation 
solution  technique  is  described. 

Erlang's  method  of  stages,  a  technique  for  modelling  service 
time  distributions  of  nearly  arbitrary  complexity,  is  explored  in 
the  second  section.  The  results  in  Chapter  3  are,  in  a  sense, 
concerned  with  how  to  exploit  this  method  most  effectively. 


The  third  sec 
the  solution  of  q 
areas  of  research 
techniques,  and  a 
admit  to  efficien 
form  the  foundati 
an  understanding 
essential.  In  Ch 
response  times  is 
already  provided 


tion  summarizes  a  number  of  recent  advances  in 
ueueing  network  models.  Two  closely-related 
are  involved:  efficient  exact  solution 
pproximation  techniques  for  models  that  will  not 
t  exact  solution.  These  solution  techniques 
on  for  the  results  in  subsequent  chapters,  and 
of  their  capabilities  and  limitations  is 
apter  4,  an  approximation  to  the  distribution  of 
derived  in  terms  of  performance  measures 
by  these  techniques. 


The  final  section  of  this  chapter  provides  an  informal 
introduction  to  Laplace  transforms,  the  tool  used  to  obtain  many 
of  the  results  that  follow. 
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From  M/M/1  Queue  to  Exponential  Queueing  Network 


The  building  block  of  all  queueing  network  models  is  the 
single  server  queue,  illustrated  in  Figure  1.  (Cohen’s  rather 
formidable  monograph  [1969]  is  the  ultimate  reference  work  on  the 
single  server  queue.  Kleinrock  [1975]  provides  a  more  suitable 
introduction,  and  is  the  source  of  much  of  the  material  in  this 
and  the  next  section.)  The  basic  parameters  of  this  queueing 
system  describe  the  arrival  process,  the  service  process  and  the 
number  of  servers.  In  the  standard  notation,  an  M/M/1  queue  has 
a  Markovian  (Poisson)  arrival  process,  a  Markovian  (exponential) 
service  time  distribution,  and  a  single  server.  Similarly,  an 
M/G/1  queue  has  a  Markovian  arrival  process,  a  general  service 
time  distribution,  and  a  single  server.  The  scheduling 
discipline  is  assumed  to  be  f irst-come-f irst-served  unless 
otherwise  specified.  Specialized  variants  are  frequently 
encountered,  and  have  been  used  with  considerable  success  to 
model  computer  systems.  For  example,  this  thesis  relies  upon 
results  for  the  M/G/1  system  with  a  fixed  number  of  customers 
(referred  to  as  a  finite  source  model)  and  the  M/G/1  system  with 
finite  waiting  room  (or  finite  capacity  queue). 


Figure  1  Figure  2 

For  the  present,  however,  the  discussion  will  be  restricted 
to  M/M/1  queueing  systems.  The  reason  is  the  analytic  simplicity 
that  arises  from  the  well-known  memoryless  £ro£erty  of  the 
exponential  distribution.  This  property  implies  that  when 
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observing  a  sequence  of  events,  the  distribution  of  time  until 
the  next  event  is  independent  of  the  aaount  of  time  that  has 
passed  since  the  last  event.  From  the  point  of  view  of  a 
customer  arriving  at  an  M/M/1  gueue,  the  memoryless  property 
implies  that  the  time  until  the  customer  presently  in  service 
completes  is  independent  of  the  amount  of  service  already 
received;  from  the  point  of  view  of  the  service  center,  it 
implies  that  the  time  until  the  next  customer  arrives  is 
independent  of  the  time  since  the  previous  customer  arrived. 

As  a  result  of  this  property,  the  state  of  an  M/M/1  queueing 
system  is  entirely  described  by  the  number  of  customers  present. 
It  is  not  necessary  to  remember  the  aaount  of  time  that  has 
elapsed  since  the  last  customer  arrived,  or  since  the  customer 
presently  in  service  began  execution.  An  M/M/1  system  in  state  k 
(i.e.,  with  k  customers  present)  will  enter  state  k+1  at  a  rate 
equal  to  the  rate  at  which  customers  arrive,  and  will  enter  state 
k-1  at  a  rate  equal  to  the  rate  at  which  customers  are  served. 
Figure  2  illustrates  the  state  transition  diagram  for  an  M/M/ 1 
queueing  system  with  arrival  rate  L  and  service  rate  D. 


The  fundamental  statistic  in  this  and  similar  systems  is  the 
equilibrium  probability  distribution  of  system  state.  From  this 
distribution,  we  can  derive  performance  measures  such  as  server 
utilization  (one  minus  the  probability  that  no  customers  are 
present),  the  mean  and  distribution  of  queue  length  (directly), 
and  the  mean  waiting  time  (from  queue  length  statistics  and 
Little’s  formula,  which  states  that  under  very  general 
assumptions,  the  average  number  of  customers  at  a  queueing  system 
is  equal  to  the  arrival  rate  of  customers  times  their  average 
waiting  time.) 


The  equilibrium  distribution  of  system  state  can  be  obtained 
by  constructing  and  solving  a  system  of  linear  equations  known  as 
the  global  balance  equations,  which  express  the  fact  that,  at 
equilibrium,  the  total  rate  of  flow  into  each  state  must  egua 1 
the  total  rate  of  flow  out  of  that  state.  This  global  balance 
equation  solution  technique  is  extremely  powerful.  In  fact,  it 
allows  us  to  solve  arbitrarily  complex  networks  of  interconnected 
exponential  service  centers. 
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Consider  a  closed  queueing  network  with  N  customers  and  M 
service  centers.  (By  closed,  we  mean  that  the  number  of 
customers  in  the  network  is  fixed.  In  an  open  network,  customers 
may  arrive  and  depart.  Closed  and  open  queueing  networks 
correspond  to  finite  and  infinite  source  single  server  queues, 
respectively.)  Since  the  service  time  distribution  at  each 
service  center  has  the  memoryless  property,  the  state  of  such  a 
network  is  entirely  described  by  the  distribution  of  customers 
among  the  various  service  centers.  The  state  space  is  both 
discrete  and  finite;  the  global  balance  equations  can  be 
constructed  and  solved,  and  performance  measures  derived. 

As  always,  though,  there  are  some  flies  in  the  ointment.  The 
first  is  cur  inability  to  represent  general  service  time 
distributions  within  our  present  balance  equation  framework.  The 
second  is  the  sometimes  prohibitive  cost  of  this  solution 
technique.  These  issues  will  be  addressed  in  the  next  two 
sections. 

The  Method  of  Stages 

The  global  balance  eguation  solution  technique  relies  upon  an 
ability  to  identify  discrete  system  states.  This,  in  turn, 
relies  upon  the  memoryless  property  of  the  service  time 
distributions  in  the  queueing  network.  Unfortunately,  the 
exponential  distribution  is  the  only  continuous  distribution 
possessing  this  property,  and,  as  we  will  see  in  Chapter  3,  it 
frequently  provides  an  insufficiently  accurate  characterization 
of  customer  behaviour. 

A  number  of  years  ago.  Erlang  developed  a  method  for 
representing  certain  distributions  by  a  sequence  of  exponential 
stages  f Brockmeyer  et  al.  1948].  His  technigue  was  generalized 
several  times,  culminating  in  the  work  of  Cox  [1955], 

The  Cox  server,  illustrated  in  Figure  3,  has  R  exponential 
stages  with  mean  service  times  Tr,  1<r<B.  Prior  to  entering  the 
rth  stage  of  service,  a  customer  leaves  the  server  with 
probability  Pr .  At  any  particular  instant,  only  one  customer  can 
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be  receiving  service  at  a  Cox  server;  any  additional  customers 
must  queue  ahead  of  the  first  stage. 


Figure  3 

In  principle,  most  reasonable  service  time  distributions 
(those  with  rational  Laplace  transforms)  can  be  modelled 
arbitrarily  closely  by  a  Cox  server  of  sufficient  complexity. 

The  virtue  of  this  technique  is  that,  since  each  stage  of  service 
is  exponential,  the  state  space  of  the  service  center  is 
discrete:  to  the  number  of  customers  present,  we  need  add  only 

the  current  stage  of  service  of  the  active  customer. 

In  practice,  the  use  of  the  method  of  stages  has  been 
restricted  to  two  forms.  The  two-stage  hyper exponent ial  server, 
illustrated  in  Figure  4,  can  match  the  first  two  moments  of 
distributions  whose  coefficient  of  variation  is  greater  than  one 
(the  coefficient  of  variation  is  equal  to  the  standard  deviation 
divided  by  the  mean) . 
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Figure  4 


Figure  5 


The  K- stage  generalized  Erlang  server,  illustrated  in  Figure  5, 
can  natch  the  first  two  moments  of  distributions  whose 
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coefficient  of  variation  is  less  than  one  (although  the  number  of 
stages  required  grows  rapidly  as  the  coefficient  of  variation 
approaches  zero). 

There  are  two  reasons  for  restricting  Cox-type  servers  to  one 
of  these  forms.  (Regardless  of  their  precise  topology,  servers 
that  exploit  the  method  of  stages  are  referred  to  as  Cox- type 
servers.)  The  first  is  a  com monly- held  belief  that  the  mean  and 
variance  of  a  service  time  distribution  characterize  it 
adequately.  The  second  is  the  lack  of  a  systematic  approach  to 
parameter  selection  for  the  general  Cox  server.  Chapter  3  of 
this  thesis  addresses  both  of  these  topics. 


The  Efficient  Solution  of  Queueing  Network  Models 


Generality  has  its  price.  Consider  a  closed  queueing  network 
with  N  customers  and  M  exponential  service  centers.  The  number 
of  states  of  this  model  is 

/ N  +  M-  1  ^ 

1  N  j 

This  number  grows  exceedingly  rapidly.  For  instance,  an 
exponential  queueing  network  model  of  a  system  with  a  CPU,  ten 
I/O  devices  and  ten  customers  would  have  184,756  states! 

Other  factors  can  make  our  plight  even  worse.  Should  the 
model  include  several  customer  classes,  each  with  a  unique  mean 
service  time  at  each  service  center,  the  number  of  states  can  be 
as  large  as 


where  S  is  the  number  of  customer  classes,  and  Ni  is  the  number 
of  customers  in  class  i,  1<i<S.  Finally,  as  a  rough  estimate, 
should  any  service  centers  have  Cox-type  servers,  the  above 
formulations  must  be  multiplied  by  the  number  of  stages  at  each 
such  center. 

Since  there  is  a  global  balance  equation  corresponding  to 
each  state,  this  exponential  growth  in  the  size  of  state  spaces 
limits  the  applicability  of  the  global  balance  equation  solution 
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technique  to  relatively  simple  queueing  networks.  It  becomes 
prohibitively  expensive  merely  to  enumerate  state  spaces  in 
excess  of  several  thousand  states;  the  solution  of  linear  systems 
of  this  size  is  costly  in  terms  of  both  space  and  time,  since  the 
matrices  lack  the  sparseness  and  structure  to  make  them  amenable 
to  existing  efficient  techniques.  Queueing  network  research  to 
circumvent  this  difficulty  has  taken  two  directions.  The  first 
involves  efficient  solution  techniques  for  queueing  networks  that 
possess  a  property  known  as  local  balance.  The  second  involves 
approximation  techniques  for  queueing  networks  that  do  not 
possess  this  property. 


•  Efficient  Exact  Solution  Techniques 

In  two  landmark  papers,  Jackson  [1957,  1963]  demonstrated 
that  in  open  and  closed  queueing  networks  composed  of  exponential 
service  centers  at  which  the  service  rates  may  depend  upon  the 
queue  lengths,  the  equations  governing  the  equilibrium 
distribution  of  system  state  have  a  product  form.  Subsequently, 
Gordon  and  Newell  [  1967]  exploited  this  product  form  in 
developing  a  separation  of  variables  solution  technique  for 
closed  exponential  networks. 

In  a  general  context,  this  means  that  the  probability  that 
the  queueing  network  is  in  a  particular  state  equals  the  product 
of  the  probabilities  that  each  individual  service  center  is  in 
the  corresponding  state,  divided  by  a  normalizing  constant  which 
ensures  that  the  equilibrium  probabilities  for  the  network  sum  to 
one.  In  other  words: 

P[S1 . V  =  l  prv  •••  P[V 

where  G  is  the  normalizing  constant  and  P[Si]  is  the  probability 
that  the  state  of  the  ith  service  center  is  Si,  1<i<M,  Because 
the  service  centers  can  be  considered  independently  of  one 
another,  and  because  the  normalizing  constant  can  be  calculated 
without  enumerating  the  states  [Buzen  1971],  this  product  form 
allows  the  computationally  efficient  analysis  of  large  queueing 
networks. 
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Recently,  the  class  of  queueing  networks  with  product  form 
solutions  has  been  considerably  extended.  It  has  been  shown  that 
all  queueing  networks  possessing  a  property  known  as  local 
balance  have  solutions  of  this  form.  Informally,  local  balance 
exists  when  the  global  balance  equations  can  be  divided  into 
consistent  subsets;  the  rate  of  flow  out  of  a  particular  state 
due  to  a  customer  leaving  a  specific  service  center  is  equal  to 
the  rate  of  flow  into  that  state  due  to  a  customer  arriving  at 
the  same  service  center.  (The  notion  generalizes  to  multiple 
customer  classes  and,  for  certain  scheduling  disciplines,  to  Cox- 
type  servers.)  Much  recent  effort  has  also  been  devoted  to 
developing  efficient  algorithms  to  compute  the  normalizing 
constant  and  various  performance  measures. 

The  classical  queueing  network  model  of  computer  systems  is 
Buzen's  exponential  central  server  model  [1971,  1973], 
illustrated  in  Figure  2  of  Chapter  1.  This  model  has  a  single 
customer  class  and  exponential  servers  at  each  service  center. 

It  is  a  special  case  of  the  queueing  networks  shown  to  have 
product  form  solutions  by  Jackson  and  by  Gordon  and  Newell,  but 
it  is  of  special  importance  for  two  reasons.  First,  Buzen  was 
the  first  to  develop  computationally  efficient  procedures  for 
determining  the  normalizing  constant,  equilibrium  distribution  of 
system  state,  throughputs,  and  queue  length  distributions. 
(Equivalent  techniques  were  independently  developed  by  Moore 
[  1972],  by  Reiser  and  Kobayashi  [  1973]  and  by  Muntz  and  Wong 
[1974].)  Second,  this  model  illustrates  the  first  of  several 
rather  remarkable  "doesn't  matter"  results:  in  a  queueing 
network  possessing  local  balance,  the  scheduling  discipline  at  a 
service  center  with  an  exponential  service  time  distribution  has 
no  effect  on  the  equilibrium  behaviour  of  the  model,  provided 
that  no  a  priori  information  concerning  the  precise  service  times 
of  specific  customers  is  used.  (The  reader  may  have  noticed  that 
scheduling  disciplines  have  received  short  shrift  thus  far  in  our 
discussion ! ) 

Baskett  r 1 97 1  ]  showed  that  a  central  server  model  with  a 
processor  shared  CPU  ( processor  sharing  is  the  limiting  case  of 
round  robin  as  the  quantum  length  goes  to  zero)  and  exponential 
I/O  service  centers  has  a  product  form  solution,  given  some  very 
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general  restrictions  on  the  CPU  service  time  distribution.  In 
doing  so,  Baskett  proved  the  second  of  the  "doesn't  »atter" 
results:  in  a  queueing  network  possessing  local  balance,  the 

higher  moments  of  the  service  time  distribution  (all  moments 
except  the  mean)  at  a  service  center  with  the  processor  sharing 
scheduling  discipline  have  no  effect  on  equilibrium  behaviour. 

At  service  centers  where  no  queueing  occurs,  such  as  the 
interactive  terminals  in  the  model  illustrated  in  Figure  1  of 
Chapter  1,  neither  the  scheduling  discipline  nor  the  higher 
moments  of  the  service  time  distribution  affect  equilibrium 
behaviour. 

Chandy  [1972]  extended  these  results  to  arbitrarily 
interconnected  networks  of  queues,  and  added  last -come-f irst- 
served  preemptive-resume  to  the  list  of  "doesn't  matter" 
scheduling  disciplines.  Finally,  Baskett  et  al.  [1975]  obtained 
a  product  form  solution  for  networks  that  extend  the  above 
results  to  include  multiple  customer  classes.  This  latter  paper 
is  of  particular  significance  because  it  synthesizes  a  number  of 
diverse  results  into  a  single,  unified  theory. 

Unfortunately,  we  still  await  the  discovery  of  necessary 
conditions  for  the  existence  of  a  product  form  solution. 
Sufficient  conditions  can  be  established  by  considering  the 
service  centers  individually.  Muntz  [1973]  shows  it  is 
sufficient  that  each  service  center  have  the  M  implies  M 
property:  a  Poisson  departure  process  in  the  presence  of  a 

Pcisson  arrival  process.  Chandy  et  al.  [1977]  formally  describe 
a  sufficient  condition  referred  to  as  station  balance;  for  our 
present  purposes,  the  relevant  interpretation  of  this  property  is 
that  each  service  center  in  the  network  must  have  either  an 
exponential  service  time  distribution,  or  a  scheduling  discipline 
under  which  the  higher  moments  of  the  service  time  distribution 
(those  above  the  mean)  do  not  affect  the  equilibrium  behaviour  of 
the  model.  Further,  priority  scheduling  disciplines  are 
excluded. 

On  the  computational  front,  the  work  of  Buzen,  Moore,  Reiser 
and  Kobayashi,  and  Muntz  and  Hong  was  followed  by  Chandy  et  al. 
[1975a],  Reiser  and  Kobayashi  [1975]  and  Hong  [1975],  each  of 
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vhom  developed  efficient  coaputat ional  algorithms  for  the  aost 
general  known  class  of  queueing  networks  with  product  fora 
solutions,  that  studied  by  Baskett  et  al.  Shua  [1977]  has 
recently  attempted  to  unify  the  various  computational  approaches 
using  a  random  variables  interpretation. 

In  summary,  efficient  computational  techniques  are  available 
to  solve  queueing  network  models  with  a  wide  range  of 
characteristics.  Excluded,  however,  are  service  centers  at  which 
moments  of  the  service  time  distribution  other  than  the  mean 
affect  the  equilibrium  behaviour  of  the  model,  as  well  as  many 
interesting  scheduling  disciplines.  For  this  reason,  it  has  been 
necessary  to  devise  accurate  and  efficient  approximation 
techniques  to  solve  queueing  network  models  that  do  not  possess 
the  local  balance  property. 


•  Approximation  Techniques 

Both  simulation  and  the  global  balance  equation  solution 
technique  can  be  considered  when  the  modeller  is  confronted  with 
a  queueing  network  that  is  not  amenable  to  efficient  exact 
analysis.  However,  the  computational  effort  required  by  these 
approaches  renders  them  impractical  for  large  problems.  A  more 
reasonable  alternative  is  the  application  of  an  approximate 
solution  technique,  i.e.,  a  technique  that  obtains  either  an 
approximate  solution  to  the  original  model,  or  an  exact  solution 
to  an  altered  model.  In  either  case,  the  objective  is  to  obtain 
results  reasonably  close  to,  but  at  a  substantial  computational 
savings  from,  those  obtainable  by  the  exact  solution  of  the 
original  model. 

The  approximation  technique  that  has  received  the  widest 
attention  is  Norton's  theorem  for  queueing  networks  [Chandy  et 
al.  1975a],  an  analogue  of  Norton's  theorem  in  electrical  circuit 
theory.  A  Norton's  theorem  reduction  in  a  queueing  network  model 
involves  replacing  a  subnetwork  that  has  a  single  input  stream 
and  a  single  output  stream  (Figure  6)  with  a  single  composite 
service  center  (Figure  7) .  The  composite  service  center  is 
intended  to  behave,  with  respect  to  the  rest  of  the  queueing 
network,  exactly  as  did  the  original  subnetwork.  If  Norton's 
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theorem  is  judiciously  applied,  the  resulting  reduced  queueing 
network  will  be  sufficiently  simple  to  permit  analysis  by  the 
global  balance  technique,  or  by  the  recursive  technique  of  Herzog 
et  al.  [  1975]. 


Figure  6 


Figure  7 


Chandy  et  al.  present  an  algorithm  for  selecting  the 
characteristics  of  the  composite  service  center  that  is  quite 
similar  to  the  analogous  procedure  in  an  electrical  circuit. 

They  then  demonstrate  that  if  the  original  queueing  network 
possesses  the  local  balance  property,  a  Norton’s  theorem 
reduction  of  a  subsystem  is  exact  in  the  sense  that  the  queue 
length  distributions  at  service  centers  not  in  the  subsystem  are 
identical  in  the  original  and  reduced  networks.  This,  of  course, 
is  precisely  the  case  in  which  approximation  techniques  are  not 
necessary.  However,  in  queueing  networks  that  do  not  possess 
local  balance,  the  application  of  Norton’s  theorem  frequently 
results  in  only  minor  error. 

A  great  deal  of  effort  has  been  devoted  to  improved  methods 
for  selecting  the  characteristics  of  the  composite  service  center 
when  the  original  queueing  network  does  not  possess  local 
balance.  Major  contributions  have  been  made  by  Chandy  et  al. 
[1975b]  and  Sauer  and  Chandy  [1975].  The  recent  paper  by  Sevcik 
et  al.  [1977]  analyzes  the  sources  of  the  remaining  error,  and 
proposes  several  new  approaches  that  show  promise  of  further 
reducing  this  error.  Sevcik  et  al.  emphasize  that  the  departure 
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process  of  the  composite  service  center  oust  Batch  that  of  the 
aggregated  subnetwork.  Norton's  theorem  for  queueing  networks  is 
an  important  and  effective  tool;  it  will  be  used  in  Chapter  3  of 
this  thesis. 

One  other  approximation  technique  is  of  special  relevance: 
the  extended  product  form  (EPF)  solution  method  recently 
developed  by  Shum  [1977],  Shum  begins  from  the  hypothesis  that 
queueing  networks  with  general  service  time  distributions  and 
f irst-come-f irst- served  scheduling  do,  in  fact,  have  product  form 
solutions.  Rather  than  constructing  the  product  from  the 
solutions  to  classical  M/M/1  queueing  systems  representing  the 
various  service  centers  in  the  network,  though,  the  EPF  method 
postulates  that  the  solutions  to  M/G/1  queueing  systems  should  be 
used.  (In  fact,  the  solutions  to  M/G/1  queueing  systems  with 
finite  waiting  room  are  employed.) 

Shum  devotes  a  great  deal  of  effort  to  the  experimental 
validation  of  the  EPF  method,  and  in  particular  of  an  iterative 
search  algorithm  needed  to  determine  a  critical  parameter  (the 
absolute  arrival  rate  to  the  various  single  server  queues).  In 
general,  the  technique  demonstrates  excellent  accuracy  and 
e  £  f  ic  ie  nc  y  . 

The  EPF  solution  method  relates  to  the  work  in  Chapter  3  of 
this  thesis  in  two  ways.  First,  the  H/G/1  queueing  system  with 
finite  waiting  room  plays  a  major  role  in  each  development. 
Second,  the  techniques  developed  in  Chapter  3  may  profitably  be 
used  to  select  server  parameters  for  the  EPF  method. 

There  are  a  nuier  of  other  approximation  techniques.  Song’s 
asymptotic  analysis  [Bong  1975]  allows  system  performance 
measures  to  be  expressed  as  simple  functions  of  model  parameters, 
provided  that  the  number  of  customers  is  large.  Certain  response 
time  problems  have  been  tackled  using  this  approach,  which  will 
be  mentioned  again  in  Chapter  4.  The  diffusion  approximation, 
nicely  summarized  in  [  Gelenbe  S  Pujolle  1976],  has  received 
considerable  attention;  some  recent  results  due  to  Yu  [1977]  will 
be  discussed  in  Chapter  5.  As  well,  Sevcik  [1977]  has 
successfully  analyzed  two-class  priority  scheduling  disciplines 
by  bounding  their  performance  above  and  below  with  models 
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possessing  local  balance.  These  last  two  approaches  are  of 
independent  interest,  but  are  not  directly  related  to  the 
problems  at  hand. 

This  concludes  our  discussion  of  queueing  network  models  and 
their  solution  techniques.  The  next  section  provides  a  brief 
introduction  to  Laplace  transforms,  which  will  be  used  frequently 
in  the  chapters  that  follow. 

A  Laplace  Transform  Primer 

Both  Laplace  transforms  and  their  discrete  analogues,  z- 
transforms,  will  be  used  in  deriving  the  results  of  this  thesis. 
This  section  provides  an  informal  guide  to  the  notation  and 
derivations  that  follow,  and  is  not  intended  to  be  comprehensive, 
even  within  the  limited  scope  of  transforms  as  applied  to 
queueing  theory.  For  further  information,  the  reader  should 
consult  Kleinrock’s  Appendix  I  [1975]  or  the  text  by  Bidder 
[  1  946  ]. 

Transforms  inevitably  arise  in  the  study  of  physical  systems. 
The  reason  is  twofold:  they  occur  naturally  (as  Kleinrock 
demonstrates,  they  describe  the  effect  of  a  linear,  time- 
invariant  system  on  its  eigenfunctions),  and  they  greatly 
simplify  certain  calculations. 

As  implied  by  its  name,  a  transform  is  used  to  shift  a 
problem  from  its  original  domain  into  one  in  which  a  solution  is 
easier  to  obtain. 

problem  solution 

roblem  A 

omain 

\ 

I 
I 

transformed 
domain  B 

Figure  8 

The  objective  is  to  get  from  A  to  D  in  Figure  8.  Transforms 
provide  a  convincing  demonstration  that  the  shortest  distance 
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A 

1 
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between  two  points  is  not  necessarily  a  straight  line.  Of 
course,  not  only  must  B  ->  C  be  easier  to  traverse  than  A  ->  D, 
but  the  transformation  (A  ->  B)  and  its  inverse  (C  ->  D)  must  be 
efficient,  as  well.  Further,  the  mappings  must  be  unique. 

The  Laplace  transform  is  a  member  of  the  class  of  integral 
transforms;  its  kernel  is  the  negative  exponential  function.  The 
Laplace  transform  of  a  function  f  (x)  is  defined  as 

o'*  -SX 

F*[s]  =  f(x)dx 

Although  the  parameter  of  the  transform  function,  s,  is  in 
general  complex,  we  will  only  have  occasion  to  consider  real 
values.  Should  f  (x)  be  a  probability  density  function,  as  will 
be  the  case  throughout  this  thesis,  then  0  <  F*[s  ]  <  1. 

It  should  be  clear  that  the  mapping  A  ->  B  in  Figure  8  is 
simple  (provided,  of  course,  that  f (x)  is  integrable  and  grows  no 
faster  than  an  exponential).  Let  us  noi  investigate  just  a  few 
of  the  properties  of  Laplace  transforms  that  typically  make  the 
mapping  B  ->  C  simple,  as  well.  (To  the  left  is  the  specified 
operation  expressed  in  terms  of  distributions;  to  the  right,  its 
equivalent  in  the  transform  domain.) 

□  Convolution 
t 

f(t-x)  g(x)  dx  :  F*[s]  G*[s] 

0 

n  Probabilistic  selection 
a  f(t)  ♦  b  g  (t) 

n  Differentiation 

d  f(t) 
dt 

As  an  example,  having  calculated  that  the  Laplace  transform 
of  an  exponential  server  is  equal  to  L/(s  +  L)  where  L  is  the 
parameter  of  the  exponential  distribution  (the  inverse  of  the 
mean  service  time) ,  it  is  simple  to  determine  the  Laplace 
transform  of  the  F-stage  generalized  Erlang  server  illustrated  in 
Figure  5; 


a  F*[  s  ]  ♦  b  G  *[  s  ] 

s  F*[  s  ] 


L 

P - 

s+L 
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Unfortunately,  the  mapping  C  ->  D  often  is  considerably  more 
difficult.  The  moaents  of  a  distribution  are  easily  obtained 
froa  its  Laplace  transform  by  repeated  differentiation  according 
to 


where  the  parenthesized  superscript  denotes  differentiation.  In 
order  to  obtain  an  expression  for  the  probability  density 
function,  though,  an  explicit  inversion  must  be  performed.  There 
are  two  basic  approaches.  The  first,  the  inspection  method, 
involves  a  partial  fraction  expansion  of  the  transform  into  terms 
whose  inverses  can  be  found  in  tables  of  transform  /  inverse 
pairs.  The  second,  the  integral  method,  involves  the  analytic  or 
numerical  evaluation  of  the  inversion  integral. 

In  this  thesis,  as  is  often  the  case,  inversion  is  not 
reguired;  the  solution  in  the  transformed  domain  provides 
sufficient  information.  He  will  terminate  our  discussion  of 
Laplace  transforms  here,  and  resume  it  when  they  are  encountered 
in  future  chapters. 


3.  The  Use  of  Percentiles  in  Characterizing 
Service  Tiae  Distributions 


Success  stories  and  "doesn't  matter"  results  to  the  contrary, 
there  are  tines  when  representing  the  CPU  as  an  exponential 
server  in  a  gueueing  network  model  results  in  unacceptable  error 
in  matching  or  predicting  various  performance  measures.  Should 
this  occur,  the  modeller  will  usually  turn  to  a  f irst-come-f irst- 
served  scheduling  discipline  with  a  two-stage  hyperexponential 
server  matching  the  mean  and  variance  of  the  observed  service 
time  distribution.  The  third  constraint  required  for  unique 
parameter  selection  typically  is  chosen  arbitrarily. 

This  chapter  develops  a  new  approach  to  characterizing 
general  service  time  distributions  in  queueing  network  models. 
Attempting  to  match  several  higher  moments  of  these  distributions 
is,  in  fact,  not  only  insufficient,  but  also  unnecessary  and 
frequently  misleading.  A  model  of  the  IBM  System/370  Model  165 
at  the  University  of  Toronto  Computer  Centre  is  used  to 
illustrate  the  magnitude  of  this  error.  I  demonstrate  both 
theoretically  and  empirically  that  attempting  to  match  the 
cumulative  distribution  function  of  the  observed  service  time 
distribution,  or,  alternatively,  its  Laplace  transform  evaluated 
at  certain  specific  points,  is  a  superior  approach.  I  introduce 
a  three-stage  Cox-type  server  whose  probability  density  function 
resembles  those  of  empirically  observed  service  time 
distributions,  and  describe  experiments  involving  a  heuristic 
that  selects  parameters  for  this  server  based  upon  the  location 
of  certain  percentiles  of  the  observed  distribution.  I  also 
describe  more  sophisticated  parameter  selection  procedures  that 
can  be  effectively  used  to  achieve  arbitrary  accuracy. 

The  results  in  this  chapter  constitute  an  improved 
methodology  for  representing  general  service  time  distributions 
in  queueing  network  models  of  computer  systems. 
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The  Occasional  Inadequacy  of  Exponential  Servers 


Considering  their  inherent  limitations,  simple  queueing 
network  models  using  exponential  servers  have  demonstrated  a 
remarkable  ability  to  predict  computer  system  performance 
accurately.  It  is  an  unfortunate  fact  of  life,  though,  that 
significant  errors  sometimes  arise  in  using  them  to  model  even 
seemingly  straightforward  computer  systems.  An  insufficiently 
accurate  characterization  of  the  CPU  service  time  distribution  is 
often  a  major  cause  of  this  error. 


Throughout  this  chapter  I  will  refer  to  a  queueing  network 
model  of  the  IBM  System/370  Model  165  at  the  University  of 
Toronto  Computer  Centre  (UTCC).  The  predictions  of  this  model 
will  te  studied  when  various  servers  are  used  to  represent  the 
CPU  service  time  distribution.  I  exercised  considerable  care  in 
constructing  the  basic  model.  It  is  of  the  central  server  type, 
with  eight  exponential  servers  representing  the  major  I/O  devices 
of  the  actual  system.  Its  parameters  were  determined  using  a 
combination  of  hardware  and  software  monitor  data  and  accounting 
data.  Particular  attention  was  paid  to  the  effect  of  parallelism 
in  the  I/O  subsystem  [Zahorjan  1976].  This  subsystem  model,  once 
developed,  was  reduced  to  a  single  load-dependent  server  via 
Norton's  theorem  [Chandy  et  al.  1975a].  The  structure  of  the 
model  is  illustrated  in  Figure  1  on  the  following  page;  the 
modelling  and  measurement  effort  is  documented  separately 
[ Lazowska  1976  ]. 


The  standard  of  comparison  will  not  be  the  actual  system,  but 
rather  a  detailed  simulation  model  in  which  the  CPU  service  time 
distribution  was  obtained  by  processing  full  interrupt  traces  of 
the  actual  system  in  operation.  Since  this  distribution  includes 
CPU  activity  attributable  to  the  operating  system,  overhead  is 
implicitly  represented.  The  use  of  a  simulation  model  eliminates 
errors  arising  from  structural  aspects  of  the  queueing  network 
model,  allowing  us  to  concentrate  on  errors  due  to  parameter 
selection  and  solution. 


The  performance  measure  used  throughout  is  CPU  utilization, 
typically  the  most  robust  metric.  The  departure  point  for  this 
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study  is  the  observation  that  with  a  multiprogramming  level  of  5, 
the  CPU  utilization  of  the  UTCC  system  is  74%.  Using  an 
exponential  CPU  server  matching  the  observed  mean,  the  gueueing 
network  model  predicts  a  utilization  of  83%.  This  12%  error  is 
unacceptably  large  for  almost  any  application;  as  an  example,  it 
represents  the  capacity  to  process  over  100  three-second  student 
jobs  per  hour. 

The  Failure  of  HE- 2  Moment  Matching 

The  utilization  of  an  M/G/1  queueing  system  depends  only  upon 
the  mean  of  the  service  time  distribution.  The  Pollaczek- 
Khinchin  mean  value  equation  demonstrates  that  the  other  mean 
performance  measure  for  this  system,  mean  time  in  system,  is 
fully  determined  by  the  first  two  moments  of  the  service  time 
distribution  [Kleinrock  1975],  In  applying  queueing  network 
models  to  computer  systems,  it  seems  reasonable  to  generalize 
this  intuition,  characterizing  the  CPU  service  time  distribution 
in  terms  of  its  moments  and  representing  the  CPU  by  the  simplest 
Cox-type  server  that  can  match  the  observed  mean  and  coefficient 
of  variation.  Since  the  coefficient  of  variation  of  empirically 
observed  CPU  service  time  distributions  is  greater  than  one,  this 
server  is  the  two-stage  hyperexponential  (HE-2) ,  illustrated  in 
Figure  2. 


Figure  1 


Figure  2 
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Since  the  two-stage  hyperexponential  is  a  three-parameter 
distr ibution,  using  it  to  match  the  mean  and  variance  of  an 
observed  distribution  results  not  in  a  single  set  of  parameters, 
but  in  a  family  of  parameter  values  constrained  by  the  equations 
for  the  mean  and  variance  of  the  server: 

mean  =  <P)T1  ♦  (1-P)T2 

variance  =  2((P)T12  ♦  (1-P)T22)  -  mean2 

where  T1  is  the  mean  service  time  of  stage  1,  P  is  the  selection 
probability  of  that  stage,  and  T2  is  the  mean  service  time  of 
stage  2. 

In  light  of  the  Pol laczek-Khinchin  equation  for  the  M/G/1 
queue,  it  is  surprising  that  when  the  various  members  of  this 
family  are  used  to  represent  the  CPU  in  a  queueing  network  model 
they  yield  dramatically  different  predictions  for  CPD 
utilization.  For  example,  when  the  family  having  a  mean  of  8.7 
and  a  coefficient  of  variation  of  12.7  (the  observed 
characteristics  of  the  UTCC  CPU  service  time  distribution)  is 
used  in  the  model  described  previously,  the  approximate  range  of 
predicted  CPU  utilizations  is: 


p 

11 

T2 

He  an 

Utilization 

.0005 

3454. 06 

6.95 

8.7 

12.7 

79% 

• 

# 

• 

• 

• 

.0124 

701.  16 

0.00  + 

8.7 

12.  7 

56* 

The  feasible  range  of  values  for  P  is  constrained  by  the 
coefficient  of  variation  of  the  distribution.  Note, 
incidentally,  that  this  example  arose  naturally  in  the  course  of 
a  modelling  effort;  it  was  not  artificially  constructed  to 
achieve  maximum  effect. 

There  is  a  convincing  intuitive  explanation  for  this 
phenomenon.  For  arbitrary  but  fixed  mean  and  variance.  Figure  3 
illustrates  the  behaviour  of  T1  and  T2  as  P  is  varied  over  its 
feasible  range.  The  value  of  T2  varies  almost  linearly  from  the 
mean  of  the  server  (asymptotically,  when  P  is  near  its  lower 
feasible  limit)  to  zero  (again  asymptotically,  as  P  approaches 
its  upper  feasible  limit).  The  value  of  T1  asymptotically 
approaches  the  mean  of  the  server  as  P  nears  its  upper  feasible 
limit. 
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Figure  3 


Figure  4 


Figure  4  illustrates  the  relative  contribution  of  each  stage 
to  the  overall  mean  of  the  server  as  P  is  varied  over  its 
feasible  range.  The  relative  contribution  of  stage  1  is  (P)T1 
divided  by  the  mean  of  the  server;  for  stage  2 ,  the  numerator  is 
( 1-P) T 2.  Notice  that  for  P  near  its  upper  feasible  limit,  the 
contribution  of  stage  2  to  the  overall  mean  is  negligible.  In 
other  words,  in  this  example  essentially  the  entire  utilization 
of  the  CPC  server  is  attributable  to  a  stage  whose  selection 
probability  is  .0124.  With  a  probability  of  .9876,  a  customer 
will  have  a  negligible  service  time  at  the  CPU  server  and  will 
proceed  immediately  to  the  I/O  service  centers.  I/O  queue 
lengths  will  grow,  and  this  congestion  will  decrease  system 
throughput  and  CPU  utilization. 

As  P  decreases,  the  relative  contributions  of  the  two  stages 
first  become  equal,  then  stage  2  dominates  until  finally,  as  P 
approaches  its  lower  feasible  limit,  almost  all  of  the  CPU  server 
activity  is  attributable  to  stage  2.  But  this  stage  now  has  a 
selection  probability  of  nearly  1,  so  CPU  utilization  is  nearly 
as  high  as  for  an  exponential  server. 

This  observation  explains  another  phenomenon  associated  with 
the  two-stage  hyperexponential  server:  as  its  coefficient  of 
variation  decreases,  so  does  the  range  of  CPU  utilizations  that 
can  be  attained.  For  instance,  using  the  same  model  with  a 
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coefficient  of  variation  of  2  instead  of  12.7  results  in  the 
following  range  of  predicted  CPU  utilizations: 


P 

11 

T2 

He  an 

C-lI*. 

Utilization 

.0005 

465. 57 

8.  45 

8.7 

2.0 

83% 

• 

• 

• 

* 

• 

• 

.4000 

21.  75 

0.00  + 

8.7 

2.  0 

74% 

With  this  lower  coefficient  of  variation,  stage  1  does  not 
account  for  all  of  the  CPU  activity  until  its  selection 
probability  reaches  .40.  There  is  greater  inherent  balance  in 
this  server,  so  queue  lengths  at  the  I/O  service  centers  do  not 
become  excessive,  and  CPU  utilization  is  not  as  severely 
affected.  Note,  incidentally,  that  the  observed  CPU  utilization 
of  the  UTCC  system  is  barely  included  in  the  range  of  this 
server. 

The  two  extremes  of  an  HE-2  family  may  be  distinguished  more 
formally  by  considering  their  probability  density  functions. 
Figures  5  and  6  display  the  general  forms  of  the  density 
functions  corresponding  to  HE-2  servers  at  the  low  and  high 
extremes  of  predicted  CPU  utilization,  respectively: 


Figure  5  Figure  6 

In  Figure  5,  corresponding  to  the  low  utilization  server,  the 
percentiles  of  the  distribution  are  concentrated  near  the  origin. 
The  density  function  corresponding  to  the  high  utilization  server 
differs  remarkably,  considering  that  the  first  two  moments  of  the 
distributions  are  identical.  Its  percentiles  are  spread  over  a 
significantly  greater  range. 
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Since  the  actual  CPU  utilization  is  included  in  the  range 
predicted  by  the  HE-2  family  matching  the  observed  mean  and 
variance,  it  is  natural  to  seek  a  third  constraint  that  will 
allow  us  to  select  the  appropriate  set  of  parameters.  Two 
approaches  have  been  used  in  practice.  The  more  common  [Sauer  & 
Chandy  1975]  attempts  to  achieve  a  balanced  server,  in  the  sense 
of  Figure  4,  by  setting  (P)T1  =  (1-P)T2.  This  constraint  is 
somewhat  arbitrary,  and  when  the  HE-2  server  satisfying  it  is 
used  in  the  model,  predicted  CPU  utilization  is  69%. 


The  more  conscientious  approach  attempts  to  match  the  third 
moment  of  the  observed  service  time  distribution,  in  addition  to 
the  first  two.  Although  it  is  not  always  possible  to  achieve 
this  match  with  an  HE-2  server,  the  observed  distribution  of  the 
UTCC  system  can  be  matched  quite  well.  Unfortunately,  though, 
predicted  CPU  utilization  is  64%.  It  is  worth  noting  that  for  an 
HE-2  server  with  fixed  mean  and  variance,  predicted  CPU 
utilization  is  an  increasing  function  of  the  third  moment;  the 
distribution  illustrated  in  Figure  6  has  a  higher  skewness  than 
the  one  illustrated  in  Figure  5. 


The  following  table  summarizes  our  rather  discouraging 
results  to  this  point: 


server  utilization 

system  74% 

exponential  83% 

HE-2 

low  extreme  of 

the  family  56% 

high  extreme  of 

the  family  79% 

matching 

skewness  64% 

equal  relative 

contribution  69% 


percent  error 

+  12 


-24 
+  7 
-13 
-  7 


In  short,  matching  the  first  several  moments  of  an  observed 
service  time  distribution  using  a  two-stage  hyperexponent ial 
server  does  not  appear  to  be  a  fruitful  approach,  unless  the 
modeller  is  willing  to  resort  to  calibration  of  the  model  via 
arbitrary  parameter  modification. 


-33- 


Price*  s  Observation 

Price  [  1976]  discusses  a  derivation  due  to  Jaiswal  [1968] 
concerning  the  M/G/1  queueing  system  with  a  fixed  number  of 
customers,  N.  This  queueing  system  is  of  relevance  here  because 
it  is  equivalent  to  a  closed  queueing  network  with  N  customers  in 
which  the  central  server  has  a  general  service  time  distribution 
with  FCFS  scheduling,  and  the  N  I/O  service  centers  have 
identical  exponential  service  time  distributions  with  no  queueing 
delays.  (Although  the  equivalence  of  these  two  systems  may  not 
be  immediately  obvious,  their  state  transition  representations 
are  identical.)  The  system  may  also  be  viewed  as  a  single 
service  center  with  an  exponential  delay  on  the  feedback  loop 
[Posner  6  Bernholtz  1  968  ], 

Jaiswal  derives  an  expression  for  server  utilization  in  the 
M/G/1  queueing  system  with  a  fixed  number  of  customers  which 
shows  that  utilization  is  a  decreasing  function  of  the  Laplace 
transform  of  the  CPU  service  time  distribution.  (In  Jaiswal's 
result,  the  transform  function  is  evaluated  at  several  specific 
points.)  Price  notes  that  since  the  Laplace  transform  encodes 
all  moments  of  a  distribution,  there  is  no  reason  to  believe  that 
the  first  few  moments  restrict  the  range  of  the  value  of  the 
transform  sufficiently  to  ensure  accuracy.  He  provides  an  HE-2 
example  illustrating  this  point. 

The  implication  of  Price's  observation  is  important  and  has 
not  received  the  attention  it  deserves:  to  ensure  accuracy  in 
the  M/G/1  queueing  system  with  a  fixed  number  of  customers,  it  is 
necessary  and  sufficient  that  at  certain  specific  points  the 
server  have  the  same  Laplace  transform  value  as  the  observed 
service  time  distribution. 

The  Laplace  transform  of  a  service  time  distribution  is  the 
integral  of  the  product  of  its  probability  density  function  and 
the  negative  exponential  function: 

CP 

5  -sx 

e  f  (x)  dx 
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Price  ' s  observation  rings  true,  for  the  dramatically  different 
probability  density  functions  of  the  HE-2  servers  corresponding 
to  opposite  ends  of  the  range  of  predicted  CPU  utilizations  have 
already  been  noted. 


Robustness 

Before  this  observation  can  be  exploited  in  building  more 
general  gueueing  network  models,  its  robustness  must  be  assessed 
by  answering  two  questions:  (1)  Does  the  Laplace  transform 

result  hold  for  queueing  systems  other  than  M/G/1  with  a  fixed 
number  of  customers?  (2)  As  a  practical  matter,  is  it  any  easier 
to  characterize  a  distribution  accurately  in  terras  of  its  Laplace 
transform  than  in  terms  of  its  moments?  These  questions  will 
each  be  addressed  in  turn: 


•  Applicability  to  Other  Queueing  Network  Configurations 

The  H/G/1  queueing  system  with  a  fixed  number  of  customers  is 
a  reasonable  high-level  model  of  a  time  sharing  system:  the 
central  server  represents  the  computer  system  itself,  and  the  N 
I/O  service  centers  represent  interactive  terminals.  For  more 
detailed  studies,  though,  the  model  is  not  sufficiently 
realistic.  For  instance,  should  we  desire  to  represent  a  CPU  and 
its  I/O  devices,  we  are  forced  to  assume  that  all  I/O  devices 
have  the  same  service  time  distribution  and  that  each  is  uniquely 
assigned  to  a  single  customer. 

In  observing  the  behaviour  of  this  queueing  network,  it  is 
apparent  that  the  magnitude  of  the  Laplace  transform  effect 
(i.e. ,  the  range  of  CPU  utilizations  predicted  by  the  various 
members  of  an  HE-2  family  with  a  particular  mean  and  coefficient 
of  variation)  is  dependent  upon  the  relative  utilizations  of  the 
CPU  and  the  I/O  devices.  This  makes  sense;  the  greater  the 
extent  to  which  the  CPU  is  the  bottleneck  device,  the  greater  the 
degree  to  which  its  service  time  distribution  affects 
performance.  (Note  that  if  the  CPU  is  fully  utilized,  its 
departure  process  will  be  identical  to  its  service  time 
distribution.  On  the  other  hand,  if  the  load  on  the  CPU  is 
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extremely  light,  its  departure  process  will  resemble  its  arrival 
process. ) 

Because  of  this,  it  is  reasonable  to  expect  that  I/O  queueing 
may  substantially  diminish  the  magnitude  of  the  effect.  The 
generality  of  the  result  can  be  established  by  considering  the 
other  extreme  of  the  queueing  network  spectrum:  a  system  with  a 
CPfJ  and  a  single  I/O  server. 

Consider  the  H/G/1  queueing  system  with  finite  waiting  room, 
i.e.,  space  for  exactly  N  customers  including  the  one  receiving 
service.  In  this  queueing  system,  customers  arrive  in  a  Poisson 
stream  at  a  fixed  rate  that  is  independent  of  the  number  of 
customers  already  in  the  system.  If  an  arriving  customer  finds  N 
customers  already  present,  however,  it  is  rejected  and  vanishes. 
Lavenberg  [1975],  among  others,  notes  that  this  queueing  system 
is  equivalent  to  a  queueing  network  having  N  customers,  a  CPU 
with  a  general  service  time  distribution  and  FCFS  scheduling,  and 
a  single,  exponential  I/O  service  center  for  which  all  customers 
must  compete.  (As  with  Price’s  observation,  the  equivalence  of 
these  two  systems  may  not  be  immediately  obvious.  Once  again, 
though,  their  state  transition  representations  are  identical. 
Although  the  M/G/1  system  has  a  Poisson  arrival  process,  new 
arrivals  are  rejected  if  there  are  already  N  customers  in  the 
system,  so  the  effective  arrival  rate  is  zero  during  these 
periods.  This  behaviour  is  identical  to  that  of  the  queueing 
net wo  rk. ) 

Of  course,  most  computer  systems  lie  somewhere  between  the 
extremes  delimited  by  the  two  M/G/1  queueing  systems  just 
described:  there  is  more  than  one  I/O  service  center,  but  there 

are  net  so  many  that  queueing  never  occurs.  A  proof  of  the 
Laplace  transform  result  for  the  M/G/1  queueing  system  with 
finite  waiting  room  would  argue  convincingly  that  the  result  is 
valid  over  the  entire  spectrum,  since  the  degree  of  I/O  queueing 
is  the  crucial  factor  in  determining  the  degree  to  which  the  form 
of  the  CPU  service  time  distribution  influences  system 
performance. 

This  proof,  along  with  some  motivational  material,  forms  an 
appendix  to  this  chapter.  In  the  remainder  of  this  section  the 
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general  behaviour  of  the  system  is  discussed.  Figure  7 
illustrates  the  range  of  CPU  utilizations  yielded  by  the  HE-2 
family  with  a  mean  of  8.7  and  a  coefficient  of  variation  of  12.7 
used  in  the  two-server  gueueing  network  model  equivalent  to  the 
M/G/1  queueing  system  with  finite  waiting  room.  This  range  is 
graphed  against  relative  I/O  power,  expressed  as  the  ratio  of  CPU 
service  time  to  I/O  service  time.  (Again,  it  is  not  surprising 
that  both  the  CPU  utilization  and  the  range  decrease  as  the  I/O 
service  center  becomes  the  bottleneck.) 


Relative  I/O  Power 


Figure  7 


Figure  8  illustrates  another  interesting  phenomenon.  It 
shows  the  range  of  predicted  utilizations  as  a  function  of 
multiprogramming  level  (N)  for  fixed  I/O  service  time.  This 
graph  is  similar  to  one  that  Price  plots  for  the  queueing  network 
equivalent  to  the  M/G/1  queueing  system  with  a  fixed  number  of 
customers.  This  phenomenon  is  well  supported  intuitively;  if  the 
multiprogramming  level  is  one,  for  example,  queueing  never 
occurs,  and  only  the  mean  of  the  CPU  service  time  distribution  is 
of  consequence. 


Results  similar  to  those  shown  in  Figures  7  and  8  can  be 
demonstrated  for  queueing  network  models  in  the  middle  of  the 
spectrum:  those  with  a  fixed  number  of  non- identica 1  I/O  ser 

centers  at  which  queueing  occurs.  For  example,  the  model  of 
University  of  Toronto  Computer  Centre  system  exhibits  all  of 
characteristics  of  the  Laplace  transform  effect.  Given  the 
existence  of  these  phenomena  throughout  the  spectrum,  along  w 
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proofs  of  the  Laplace  transform  result  at  either  extreme,  it 
seems  safe  to  assert  that  the  Laplace  transform  result  applies 
throughout. 

•  Characterizing  Distributions  in  Terms  of  Laplace  Transforms 

The  previous  paragraphs  argue  that  throughout  the  central 
server  queueing  network  spectrum,  CPU  utilization  is  a  decreasing 
function  of  the  value  of  the  Laplace  transform  of  the  CPU  service 
time  distribution  evaluated  at  certain  points.  It  has  yet  to  be 
shown,  though,  that  the  result  is  of  any  practical  value.  By 
using  a  sufficiently  complex  Cox-type  server,  enough  moments  of 
an  observed  service  time  distribution  can  be  matched  to  ensure 
accurate  predicted  utilization.  It  is  not  at  all  obvious  that 
accurately  characterizing  a  distribution  in  terms  of  its  Laplace 
transform  is  any  less  complex. 

Return  to  Figures  5  and  6  and  to  the  definition  of  the 
Laplace  transform.  Consider  s  to  be  fixed,  so  that  the  transform 
specifies  a  value  rather  than  a  function.  Because  the  mass  of 
the  Figure  5  density  function  is  concentrated  near  the  origin, 
the  area  under  the  product  curve  of  this  function  and  the 
negative  exponential  will  be  quite  large,  and  the  Laplace 
transform  result  correctly  predicts  that  CPU  utilization  will  be 
quite  low.  The  percentiles  of  the  Figure  6  density  function  are 
spread  over  a  wider  range,  and  the  area  under  the  corresponding 
product  curve  will  be  quite  small,  leading  to  higher  predicted 
utilization.  (These  statements  will  be  quantified  in  a  later 
section. ) 

The  key  to  the  usefulness  of  the  Laplace  transform  result 
lies  in  the  negative  exponential.  Just  as  values  of  the  density 
function  are  multiplied  by  factors  that  decrease  exponentially 
with  distance  from  the  origin,  so  the  significance  of  any  error 
in  the  specification  of  the  density  function  decreases 
exponentially  with  distance  from  the  origin.  In  other  words, 
large  errors  in  the  specification  of  the  density  function,  f(x), 
can  be  tolerated  for  large  x,  while  still  arriving  at  a  Laplace 
transform  value  extremely  close  to  that  of  the  observed 
d  istr ibu tion. 
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Hatching  Percentiles 

In  characterizing  an  observed  service  time  distribution  in 
terras  of  its  Laplace  transform,  we  are  more  concerned  with 
identifying  the  area  in  which  its  mass  is  concentrated  than  with 
its  density  at  a  few  selected  points.  For  this  reason,  the 
cumulative  distribution  function,  F  (x) ,  which  expresses  the 
location  of  the  various  percentiles  of  the  distribution,  is  a 
more  appropriate  measure  than  the  probability  density  function, 
f  (x) .  The  characterist ics  of  the  DTCC  CPU  service  time 
distribution  (expressed  in  milliseconds)  are: 

coefficient  coefficient  percentiles 

mean  of _ variation  of  skewness  10th  25th  5  0th  75 th  90th 

8.70  12.7  3.0  1.2  1.9  3.8  6.8  10.9 

Compared  to  other  distributions  with  an  eguivalently  high 
coefficient  of  variation,  the  observed  distribution  has  a  median 
(the  50th  percentile  will  be  used  for  comparative  purposes)  that 
lies  quite  far  from  the  origin.  The  value  of  its  Laplace 
transform  evaluated  at  any  point  is  therefore  relatively  low,  and 
CPU  utilization  relatively  high.  This  statement  is  supported  by 
the  observation  that  using  the  HE-2  server  at  the  high 
utilization  extreme  of  the  family  matching  the  first  two  moments 
of  the  observed  distribution  (the  HE-2  server  with  a  density 
function  resembling  that  of  Figure  6)  results  in  a  predicted 
utilization  only  slightly  greater  than  the  observed  value. 

To  understand  the  error  that  results  when  skewness  is  used  as 
the  third  constraint  for  parameter  selection,  notice  that  the 
observed  distribution  has  a  relatively  low  coefficient  of 
skewness.  (The  coefficient  of  skewness  equals  the  cube  root  of 
the  skewness,  divided  by  the  standard  deviation.)  tfithin  an  HE-2 
family,  skewness  is  proportional  to  the  location  of  the  median 
(and  thus  inversely  proportional  to  P) .  For  the  family  with  the 
observed  mean  and  variance,  the  attainable  range  for  the 
coefficient  of  skewness  runs  from  2.7  to  4.5,  and  binding  the 
parameters  by  matching  skewness  commits  us  to  a  distribution  with 
relatively  low  median,  high  Laplace  transform  values,  and  low 
predicted  utilization. 
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It  is  worth  mentioning 


distribution 
the  spectrum: 
In  this  case, 
have  resulted 
too  high. 


could  just  as 
low  median , 
both  methods 
in  predicted 


that  the  observed  service  time 
easily  have  occupied  the  other  end  of 
high  skewness,  and  low  utilization, 
of  binding  the  HE-2  parameters  would 
utilizations  that  were  substantially 


Figure  9  roughly  illustrates  the  density  function  of  the  UTCC 
CPU  service  time  distribution.  Its  most  distinctive 
characteristic  is  that  the  density  is  zero  at  the  origin. 

Clearly  an  HE-2  server,  which  has  its  greatest  density  at  the 
origin,  will  not  be  able  to  provide  a  close  match  in  this 
critical  area. 


Figure  9  Figure  10 

Accordingly,  this  section  considers  a  three-stage  Cox-type 
server  with  more  suitable  inherent  characteristics.  This  server 
is  illustrated  in  Figure  10.  (In  the  next  section,  we  return  to 
the  HE-2  server  and  attempt  to  match  percentiles  using  it.  The 
error  in  predicted  utilization  is  surprisingly  small.) 

I  have  developed  a  simple  heuristic  that  attempts  to  match 
the  mean  and  several  percentiles  of  an  observed  distribution  by 
varying  the  parameters  of  the  server  illustrated  in  Figure  10. 
There  are  thus  three  free  parameters;  the  fourth  parameter  is 
adjusted  to  maintain  the  mean.  The  heuristic  biases  in  favour  of 
percentiles  located  near  the  origin,  both  in  selecting  those 
percentiles  to  be  matched  and  in  setting  the  error  tolerance. 
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Alt  hou  gh  approximating  a  density  function  by  means  of  percentiles 
generally  requires  that  a  large  number  of  percentiles  be 
specified  where  the  slope  of  the  density  function  differs  most 
from  zero,  the  fact  that  we  are  working  with  a  server  whose 
distribution  has  certain  inherent  nice  properties  (e.g., 
continuity)  minimizes  the  impact  of  this  requirement. 

The  heuristic  used  to  match  F(x)  is  not  interesting  enough  to 
describe  in  detail.  (A  more  sophisticated  approach,  which 
attempts  to  match  the  Laplace  transform  of  the  observed 
distribution  directly,  will  be  discussed  in  a  subsequent 
section.)  The  value  of  x  at  which  a  particular  cumulative 
percentage  occurs  is  monotone  in  each  free  parameter  except  at 
one  extreme.  It  is  important  to  realize  that  we  are  not  solving 
a  linear  system;  the  existence  of  three  free  parameters  does  not 
imply  that  three  cumulative  percentages  can  be  matched.  For  some 
observed  distributions,  five  points  can  be  matched,  each  to 
withir  5%;  for  others,  the  best  fit  to  three  points  has  a  maximum 
error  in  excess  of  15%.  For  a  particular  cumulative  percentage 
of  the  observed  distribution,  F(x),  the  error  is  measured  as  the 
difference  between  F(x)  and  the  cumulative  percentage  that 
results  when  the  cumulative  distribution  function  of  the  server 
is  evaluated  at  x. 

The  observed  distribution  of  the  UTCC  system  can  be  matched 
quite  well  by  the  server.  When  a  three-percentile  fit  is 
attempted,  the  x-values  corresponding  to  the  25th,  50th  and  75th 
percentiles  yield  cumulative  percentages  of  .252,  .518  and  .722, 
respectively.  At  the  expense  of  some  accuracy  at  these 
percentiles,  a  five-percentile  fit  that  includes  the  15th  and 
90th  can  be  achieved. 

When  the  model  is  evaluated  using  either  of  these  servers, 
the  predicted  utilization  of  the  CPU  is  75%,  an  error  of  roughly 

It. 
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Furt  her  Experiments 

A  number  of  experiments  have  been  performed  that  add  weight 
to  the  success  and  potential  of  this  new  approach  to 
characterizing  general  service  time  distributions: 

•  Matching  the  Percentiles  of  Other  Observed  Distributions 

In  discussing  the  UTCC  CPU  service  time  distribution,  a 
contrasting  distribution  was  postulated:  one  with  low  median, 
high  skewness,  and  low  utilization.  Such  a  distribution  was 
constructed,  with  mean  and  variance  reasonably  close  to  those  of 
the  observed  distribution.  Its  characteristics  are: 

coefficient  coefficient  percentiles 

mean  of  variation  of _ske vne ss  10th  25th  50th  75th  90th 

8.73  11.3  4.5  0.5  0.6  0.9  1.2  1.7 

This  distribution  will  subsequently  be  referred  to  as  D2,  and  the 
observed  distribution  will  be  referred  to  as  D1. 

When  D2  is  employed  in  the  simulation  model,  CPU  utilization 
is  61 %.  The  mean  and  variance  of  D2  are  sufficiently  close  to 
those  of  D 1  that  the  utilizations  predicted  by  the  various  HE-2 
servers  can  be  taken  from  the  table  that  appeared  earlier.  No 
HE-2  server  is  able  to  match  all  of  the  first  three  moments  of 
this  distribution;  the  high  extreme  of  the  family  matching  the 
first  two  moments  comes  closest,  and  results  in  an  error  of  30% 
in  predicted  CPU  utilization. 

The  heuristic  is  not  able  to  find  a  particularly  close  match 
to  D2.  Using  a  four-percentile  fit,  the  x-values  corresponding 
to  the  25th,  50th,  75th  and  90th  percentiles  yield  cumulative 
percentages  of  .298,  .422,  .600  and  .775,  respectively. 
Nonetheless,  when  the  analytic  model  is  evaluated  using  this 
server,  predicted  CPU  utilization  is  60%,  an  error  of  less  than 
2%. 
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•  Matching  Percentiles  Using  an  HE-2  Server 

Because  the  density  function  of  a  two-stage  hyperexponential 
server  has  its  greatest  value  at  the  origin,  it  is  impossible  to 
obtain  a  close  fit  to  typical  CPU  service  time  distributions, 
which  have  zero  density  there.  Nonetheless,  I  have  attempted  to 
match  the  mean  and  three  percentiles  of  both  D1  and  D2  using  HE-2 
servers. 

For  the  HE-2  matching  D1,  the  x-values  corresponding  to  the 
10th,  25th  and  50th  percentiles  yield  cumulative  percentages  of 
.169,  .253  and  .443,  respectively.  Surprisingly,  when  the  model 

is  evaluated  using  this  server,  CPU  utilization  is  77%,  an  error 
of  only  4%. 

For  the  HE-2  matching  D2,  the  respective  cumulative 
percentages  are  .283,  .330  and  .451.  Predicted  CPU  utilization 
using  this  server  is  60%,  an  error  of  less  than  2%. 

These  two  experiments  illustrate  the  robustness  of  the 
technique.  Note,  incidentally,  that  the  error  in  matching  D2  is 
not  as  severe  as  appears  at  first  glance.  The  slope  of  the 
cumulative  distribution  function  is  so  great  in  this  range  that 
the  x-value  of  each  percentile  and  the  value  of  the  density 
function  there  are  only  slightly  in  error.  The  forthcoming 
discussion  of  more  sophisticated  approaches  to  parameter 
selection  will  shed  additional  light  on  this. 


•  The  Apparent  Unimportance  of  Variance 

There  is  only  a  limited  correlation  between  the  variance  of 
the  observed  CPU  service  time  distribution  and  the  variance  of 
the  servers  that  yield  the  correct  CPU  utilization. 

First  consider  D1,  which  has  a  coefficient  of  variation  of 
12.7.  The  three-stage  server  matching  three  percentiles  has  a 
coefficient  of  variation  of  2.0,  while  the  one  matching  five 
percentiles  has  a  coefficient  of  variation  of  2.4.  The  HE-2  that 
yields  a  4%  error  has  a  coefficient  of  variation  of  35.7. 
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D 2  has  a  coefficient  of  variation  of  11,3.  The  three-stage 
server  matching  it  has  a  coefficient  of  variation  of  110,  while 
the  HE-2  matching  it  has  a  coefficient  of  variation  of  117. 

Further  evidence  comes  from  experiments  with  the  three-stage 
server  matching  D2.  By  varying  P  (and  maintaining  the  mean  by 
varying  T3  as  well),  it  is  possible  to  radically  alter  the 
coefficient  of  variation  of  the  server  without  affecting  its 
percentiles  substantially.  As  the  coefficient  of  variation  is 
varied  from  12  to  400,  CPU  utilization  drops  from  59.2%  to  58.3%, 
a  truly  insignificant  change. 

Of  course,  predicted  CPU  utilization  is  not  unrelated  to  the 
coefficient  of  variation  of  the  CPU  server.  The  two  are 
inversely  proportional  in  the  following  limited  sense: 

Consider  a  gueueing  network  model  in  which  the  CPU  is 
represented  by  an  HE-2  server  with  mean  M  and  coefficient  of 
variation  V.  Let  the  predicted  CPU  utilization  of  this  model  be 
U.  There  exists  an  HE-2  server  with  mean  M*  =  M  and  coefficient 
of  variation  V'  <  V  that,  when  used  in  the  model,  results  in  a 
predicted  CPU  utilization  U'  >  U. 

•  Other  Performance  Measures 

Because  of  the  manner  in  which  performance  measures  are 
calculated  for  queueing  network  models,  an  improvement  in  the 
accuracy  of  predicted  CPU  utilization  will  result  in  a 
corresponding  improvement  in  the  predicted  utilization  of  the 
other  service  centers  and  in  system  throughput.  It  is 
reassuring,  though,  to  note  that  characterizing  the  CPU  service 
time  distribution  using  the  techniques  advocated  in  this  chapter 
also  results  in  an  improved  prediction  for  mean  CPU  queue  length. 

The  simulation  model  of  the  TJTCC  system  has  a  mean  CPU  queue 
length  of  2.31.  The  family  of  two-stage  hyperexponential  servers 
predicts  a  range  of  queue  lengths  from  2.20  (at  the  high 
utilization  end  of  the  spectrum)  to  2.72.  The  three-stage  Cox- 
type  server  predicts  2.28,  while  the  HE-2  with  parameters 
selected  on  the  basis  of  percentiles  predicts  2.23. 
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Parameter  Selection  Techniques  of  Greater  Sophistication 


Based  upon  the  preceding  results,  the  objective  in 
representing  the  CPU  in  a  central  server  queueing  network  model 
is  to  select  a  server  with  certain  Laplace  transform  values  equal 
to  those  of  the  observed  service  time  distribution.  At  one 
extreme  of  the  queueing  network  spectrum  (equivalent  to  the  M/G/1 
queueing  system  with  finite  waiting  room)  the  transform  is 
evaluated  at  a  single  point,  L  (the  inverse  of  the  mean  I/O 
service  time) .  At  the  other  extreme  (equivalent  to  the  M/G/1 
queueing  system  with  a  fixed  number  of  customers)  the  transform 
is  evaluated  at  the  N  points  nl,  C  <  n  <  N-1,  where  N  is  the 
number  of  customers  in  the  system  and  L,  once  again,  is  the 
inverse  of  the  mean  I/O  service  time. 


The  simple  heuristic  used  for  parameter  selection  in  the 
modelling  experiments  of  previous  sections  can  best  be  described 
as  an  indirect  approach  to  this  objective,  since  it  seeks  to 
match  the  Laplace  transform  by  matching  the  percentiles  of  the 
observed  distribution  lying  near  the  origin.  This  section 
describes  two  alternative  approaches  to  parameter  selection.  The 
first  is  an  improved  indirect  method,  based  on  work  by  Bux  and 
Herzog.  The  second  is  a  direct  method;  in  ether  words,  it 
selects  a  server  on  the  basis  of  its  Laplace  transform.  The  goal 
of  this  section  is  to  specify  concretely  a  parameter  selection 
methodology  that  exploits  the  results  of  the  chapter. 


Numerical  Evaluation  of  the  Laplace  Transform 


Regardless  of  whether  an  indirect  or  direct  parameter 
selection  technique  is  employed,  the  first  step  is  to  evaluate 
numerically  the  Laplace  transform  of  the  observed  service  time 
d istr ibut ion . 


In  general,  this  task  can  be  rather  tricky.  A  large  measure 
of  the  difficulty  can  be  eliminated  by  working  in  terms  of  the 
cumulative  distribution  function  of  the  service  time 
distribution,  F(x)1,  since  it  is  smoother  than  the  probability 
density  function,  f(x).  This  presents  no  difficulty,  for  if 
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A*[  s]  is  the  Laplace  transform  of  F  (x) ,  then  s  A*[s]  is  the 
Laplace  transform  of  f (x)  . 

The  task,  then,  is  to  determine  the  value  of  the  Laplace 
transform  of  the  service  time  distribution 


for  certain  values  of  s.  Composite  Gauss  quadrature  can  be  used 
to  evaluate  the  integral,  as  follows: 

□  Determine  the  values  of  s  that  must  be  considered. 

a  Find  the  upper  limit  of  integration  that  is  required  to 

ensure  sufficient  accuracy  over  this  range  of  s. 

n  Divide  the  interval  of  integration  into  sub-intervals,  and 
use  a  low-order  Gauss  formula  to  integrate  each  sub-interval. 

In  each  of  the  M/G/1  queueing  systems  considered  previously, 
the  Laplace  transform  of  the  CPU  service  time  distribution  was 
evaluated  at  points  corresponding  to  the  throughput  rates  of  the 
I/O  subsystem  with  various  numbers  of  customers  present. 
Generalizing  from  this,  the  load-dependent  throughput  rates  of 
the  I/O  subsystem  (in  the  sense  of  Norton’s  theorem)  are  the 
appropriate  values  of  s  at  which  to  evaluate  the  transform  for 
systems  between  these  two  extremes. 

The  smallest  non-zero  value  of  s,  namely  L,  will  require  the 
largest  interval  of  integration,  since  B*[0]  equals  one 
regardless  of  the  form  of  F(x).  Knowing  L  and  the  rate  at  which 
F  is  asymptotic  to  one,  a  safe  upper  limit  for  the  interval  can 
easily  be  determined.  The  width  of  the  sub-intervals  can  be 
allowed  to  increase  with  increasing  x. 

In  practice,  I  do  not  concern  myself  with  tuning  the 
integration  to  the  specific  system  being  analyzed  (except,  of 
course,  for  scaling  with  respect  to  the  mean).  I  generally  use 
five  sub-intervals,  each  twice  the  width  of  the  one  preceding  it. 
This  requires  that  F(x)  be  specified  at  fifteen  points  when  a 
three  point  Gauss  formula  is  employed  in  each  sub-interval.  For 
the  service  time  distributions  and  ranges  of  s  that  I  have 


-46- 


encountered,  this  provides  accuracy  to  the  fourth  decimal  place. 
Alternatively,  should  existing  measurement  data  specify  F  (x)  at 
pre-deter mined  values,  the  sub-intervals  can  be  selected  to 
coincide.  In  either  case,  a  modicum  of  cleverness  in  programming 
the  integration  makes  it  possible  to  estimate  the  accuracy  of  the 
solution,  as  well  as  its  sensitivity  to  the  particular  points  at 
which  F  (x )  is  specified. 


As  an  example,  the  following  table  displays  the  calculated 
values  of  the  Laplace  transform  of  D1,  the  observed  CPU  service 
time  distribution  of  the  University  of  Toronto  Computer  Centre 
system.  The  transform  is  evaluated  at  several  points  between 
zero  and  one.  For  comparison,  the  table  also  displays  the 
corresponding  Laplace  transform  values  of  D2,  which  has 
approximately  the  same  mean  and  variance  as  D1: 


s 

.  0 
.  2 
.4 
.  6 
.  8 
1.0 


1 


D  1_ 

.000 
.  463 
.276 
.  184 
.130 
.  096 


B*[s] 

1 


D2 

.000 

.808 

.  671 
.  561 
.471 

.398 


To  evaluate  a  transform  at  twenty  points  requires  one  second  of 
CPU  time  using  a  Gauss-Legend  re  quadrature  routine  from  the  STUNT 
package  [Johnston  &  Addison  1977], 

The  exercise  of  numerically  evaluating  B*[s]  is  worthwhile 
regardless  of  whether  an  indirect  or  direct  parameter  selection 
technique  is  employed.  In  the  former  case,  it  indicates  the 
points  at  which  F  (x)  should  be  matched  in  order  to  achieve  the 
greatest  accuracy.  In  the  latter  case,  it  provides  the  data 
against  which  the  Laplace  transform  values  of  the  server  will  be 
compared . 


Matching  Percentiles  Using  Linear  and  Non-Linear  Programming 

The  approach  most  often  proposed  to  select  parameters  for 
Cox— type  servers  based  upon  the  measured  characteristics  of 
distributions,  namely  Prony's  method  [Cox  1955],  suffers  from  a 
number  of  severe  disadvantages  [ Bux  &  Herzog  1977a].  Leroudier 
and  Schroeder  [1974]  have  outlined  a  maximum  likelihood  approach 
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to  parameter  selection  for  the  general  Cox  server,  and  have 
demonstrated  its  use  by  matching  the  probability  density 
functions  of  several  distributions  arising  in  an  IBM  CP/CMS 
system  at  the  University  of  Grenoble.  More  recently,  Bux  and 
Herzog  [1977b]  have  described  a  technique  that  selects  parameters 
for  a  restricted  form  of  the  general  Cox  server  based  upon  the 
mean,  the  second  moment,  and  an  arbitrary  number  of  points  on  the 
cumulative  distribution  function  of  an  observed  service  time 
d  istr ibution. 

The  server  used  by  Bux  and  Herzog,  illustrated  in  Figure  11, 
consists  of  R  exponential  stages  in  series,  each  with  the  same 
mean  service  time,  T.  After  completing  service  at  stage  r,  1  <  r 
<  R- 1 ,  a  customer  either  leaves  the  server,  with  probability  Pr, 
or  enters  service  at  stage  r+1,  with  probability  1-Pr. 


Figure  11 

Bux  and  Herzog  first  demonstrate  that  this  server  shares  with 
the  general  Cox  server  the  important  property  that  it  can 
approximate  any  distribution  with  a  rational  Laplace  transform 
arbitrarily  closely.  They  then  outline  an  algorithm  to  perform 
the  following  approximation  task: 

Given  the  J  values  of  the  observed  service  time 

distribution’s  cumulative  distribution  function,  F(x),  evaluated 

at  points  x  ,  ...  ,x  ,  as  well  as  the  first  and  second  moments  of 

J 

the  service  time  distribution,  E[ F]  and  E[F2],  select  parameters 
R,  T  and  Pr  (1<r<R-1)  for  the  server  illustrated  in  Figure  11 
such  that  the  following  constraints  are  satisfied  by  its 
cumulative  distribution  function,  G  (x) : 
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G(X  .) 

3 

< 

F(x.)  +  « 

1  j 

1<j<J 

G(x  .) 

3 

> 

F(X.)  -  (i 

3  j 

1<j<J 

E[G] 

— 

E[Fj 

EfG^l 

— 

E[  ] 

where  the  J  values  of  f)  and  0  are  specified. 

As  discussed  in  a  previous  section,  this  approximation  task 
is  not  a  linear  problem.  Bux  and  Herzog  proceed  as  follows: 

□  Set  R  to  the  minimum  possible  value,  based  upon  the  first  and 
second  moments  of  the  observed  service  time  distribution. 

□  For  a  particular  value  of  T,  use  a  linear  optimization 
technique  to  select  the  optimum  values  of  the  Pr. 

□  Find  the  optimum  value  of  T  by  using  a  non-linear 
optimization  technique. 

n  If  the  constraints  are  not  satisfied,  increment  R  by  one  and 
iterate. 


When  combined  with  knowledge  concerning  the  specific  poin 
of  the  cumulative  distribution  function  that  should  be  matcha 
the  Bux  and  Herzog  technique  offers  a  well-formulated  alterna 
to  the  simple  heuristic  used  in  previous  sections.  Its  princ 
advantage  is  the  assurance  that  the  constraints  are  satisfied 
its  principal  disadvantages  are  the  potentially  high  cost  of 
algorithm  outlined  above,  and  the  potentially  large  number  of 
stages  that  may  be  introduced.  Note  that  the  constraint 
regarding  the  second  moment  can  be  eliminated;  it  arose  from 
Pollaczek-Khinchin  M/G/1  intuition  [Herzog  1977],  which  has  b 
shown  to  be  inapplicable  in  a  queueing  network  context.  Howe 
it  is  useful  to  include  the  mean  of  the  distribution  as  an 
explicit  constraint. 
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•  Matching  the  Laplace  Transform  Directly 

Since  the  effect  of  the  CPD  service  time  distribution  in  a 
central  server  queueing  network  model  is  entirely  captured  by 
value  cf  its  Laplace  transform  at  certain  points,  the  most 
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straightf orward  means  of  characterizing  an  observed  distribution 
is  to  select  a  server  directly  on  the  basis  of  its  corresponding 
Laplace  transform  values.  One  major  advantage  of  this  approach 
is  the  direct  correlation  between  the  accuracy  of  the  Laplace 
transform  approximation  and  the  accuracy  of  the  server  when  used 
in  a  queueing  network  model. 


The  Laplace  transform  of  any  Cox-type  server  is  easily 
expressed  in  terms  of  its  parameters.  For  illustrative  purposes, 
consider  the  three-stage  server  shown  in  Figure  10,  which  has  the 
proper  balance  of  simplicity  and  sufficiency.  Its  Laplace 
transform  is 


1/T 1  P/T2  (1-P)/T3 

C*[s]  = - {  -  + - } 

s  +  1/T1  s  +  1/T 2  s  +  1/T  3 

The  goal  is  to  select  parameters  T1 ,  T2,  T3,  and  P  for  this 
server,  such  that  its  mean  and  the  value  of  its  Laplace  transform 
at  certain  points  match  the  corresponding  features  of  the 
observed  service  time  distribution.  To  select  the  parameter 
values,  a  weighted  non-linear  least  squares  approximation 
technique  is  appropriate.  Such  a  procedure  requires  that  the 
partial  derivatives  of  C*[sj  with  respect  to  each  of  its 
variables  be  provided;  these  are  easily  obtained. 


Since  evaluations  of  the  Laplace  transform  of  the  observed 
service  time  distribution  come  virtually  for  free  once  the  values 
of  its  cumulative  distribution  function  have  been  provided,  and 
since  the  performance  of  the  least  squares  method  improves  when  a 
large  number  of  observations  are  available,  it  is  preferable  to 
compare  the  transforms  of  the  server  and  the  observed 
distribution  over  a  wider  range  than  simply  the  load- dependent 
throughput  rates  of  the  I/O  subsystem.  For  the  UTCC  model,  the 
ten  equally  spaced  points  from  0.1  to  1.0  are  appropriate.  Since 
the  mean  of  a  function  is  equal  to  the  negative  of  the  first 
derivative  of  its  Laplace  transform  evaluated  at  zero,  it  is 
possible  tc  ensure  that  the  parameters  selected  will  result  in  a 
server  with  the  correct  mean  by  including  one  artificial 
observation:  a  heavily  weighted  point  at  s  =  .0001,  with  a  value 
selected  so  that  the  slope  of  the  Laplace  transform  between 
and  .0001  is  equal  to  the  negative  of  the  observed  mean. 


zero 
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Using  a  non-linear  least  squares  routine  from  the  BMD  package 
[Dixon  1973],  a  set  of  parameters  that  results  in  a  maximum 
transform  error  of  under  2%  with  respect  to  D1,  the  observed  CPU 
service  time  distribution  of  the  Toronto  system,  can  be  found  in 
less  than  three  seconds  of  CPU  time.  The  table  that  follows 
compares  the  characteristics  of  this  server  to  those  of  the 
observed  service  time  distribution: 

observed  three-stage 

distribution  server 


mean 

8.7 

8.7 

C 

.  V. 

12.7 

3.2 

B* 

[  .0] 

1.000 

1 . 000 

B* 

•  .2' 

.463 

.  464 

B* 

'  .4' 

.276 

.276 

B* 

.  6 ' 

.  184 

.  183 

B* 

.8“ 

.  130 

.131 

B* 

‘  1.  ' 

.096 

.  098 

When  the  server  with  these  parameters  is  used  in  the  model  of  the 
Toronto  system,  predicted  CPU  utilization  is  74%,  precisely  the 
observed  value. 

The  Laplace  transform  of  D2,  the  distribution  complementary 
to  D1 ,  was  also  evaluated  and  matched  using  the  techniques 
described  in  this  section.  When  the  parameters  selected  are  used 
in  the  model,  predicted  CPU  utilization  is  61%,  once  again 
precisely  the  observed  value. 

This  parameter  selection  methodology  constitutes  a  specific 
and  well-founded  embodiment  of  the  results  of  the  chapter.  Its 
extension  to  arbitrarily  complex  Cox-type  servers  will  be 
discussed  in  Chapter  5. 


Summary 

The  need  in  analytic  modelling  for  non-exponential  service 
time  distributions  with  f irst-come-f irs t-served  scheduling 
disciplines  is  well  established.  There  are  occasions, 
illustrated  by  my  own  attempts  to  model  the  University  of  Toronto 
Computer  Centre  system,  when  other  service  centre  representations 
simply  are  not  accurate  enough. 

The  work  described  in  this  chapter  demonstrates  that  the 
intuition  derived  from  the  Pollaczek-K h inchin  equation  for  the 
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M/G/1  gueueing  system  is  not  applicable  in  a  gueueing  network 
context.  Matching  the  second  or  even  third  moment  of  an  observed 
service  time  distribution  is  both  insufficient  and  inefficient 
(in  the  sense  of  obtaining  the  greatest  modelling  accuracy  for  a 
given  level  of  server  complexity) .  Of  course,  such  models  can  be 
calibrated  by  arbitrarily  adjusting  the  parameters  of  the  CPU  or 
the  I/O  servers,  but  such  tampering  can  only  diminish  confidence 
in  the  predictive  ability  of  the  model. 

This  chapter  argues  that  in  central  server  queueing  network 
models,  the  effect  of  the  CPU  service  time  distribution  is  fully 
determined  by  the  value  of  its  Laplace  transform  at  several 
points.  This  result  leads  to  parameter  selection  methodologies 
based  upon  matching  either  certain  percentiles  or  certain  Laplace 
transform  values  of  an  observed  service  time  distribution. 

Perhaps  the  most  succinct  explanation  for  the  behaviour  we 
have  observed  follows  from  the  experiments  concerning  the 
apparent  unimportance  of  variance.  One  might  deduce  from  these 
experiments  that  for  the  models  we  have  been  considering,  the 
tails  of  service  time  distributions  do  not  play  an  important  role 
in  the  determination  of  system  performance.  This  is  a  correct 
inference,  and  it  deserves  further  comment  since  it  serves  to 
highlight  a  theme  that  will  recur  in  Chapter  4:  finite  and 
infinite  population  systems  have  decidedly  different 
charac  ter ist ics. 

Buzen  [1977]  has  explained  this  phenomenon  in  terms  of  a 
•'limited  damage”  argument:  in  a  closed  gueueing  network,  the 
performance  degradation  resulting  from  a  customer  with  an 
extremely  long  service  time  is  bounded,  because  at  most  N-1 
customers  (where  N  is  the  multiprogramming  level)  will  queue 
behind  that  customer.  Thus  the  transient  behaviour  of  such  a 
system  is  much  less  volatile  than  that  of  a  similar  open  system. 

This  observation  lies  at  the  heart  of  the  results  in  this 
chapter.  In  essence,  the  moments  of  a  distribution  are  strongly 
influenced  by  its  tail,  while  the  Laplace  transform  of  a 
distribution  tends  to  emphasize  its  form  near  the  origin.  Thus 
in  a  closed  queueing  network,  the  Laplace  transform  provides  a 
superior  characterization  of  the  service  time  distribution. 
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Of  course,  if  an  observed  service  time  distribution  is 
represented  by  a  server  that  Batches  a  large  number  of  moments, 
then  the  percentiles  and  Laplace  transform  values  of  the  observed 
distribution  will  also  be  matched.  Similarly,  if  a  large  number 
of  percentiles  or  Laplace  transform  values  are  matched,  the 
moments  will  also  be  correct.  Approximations  are  necessary  in 
modelling,  though,  and  the  results  in  this  chapter  demonstrate 
that  matching  either  percentiles  or  Laplace  transform  values  is 
more  fruitful  than  matching  moments. 


The  utility  of  the  techniques  developed  here  hinges  on  two 
factors.  The  first  is  the  severity  of  the  error  that  the 
techniques  address;  in  other  words,  the  degree  of  improvement 
that  they  provide.  My  experience  in  modelling  the  University  of 
Toronto  Computer  Centre  system  demonstrates  that  the  improvement 
is  of  clear  practical  significance.  The  Pollaczek-Kh inch  in  mean 
value  equation  indicates  that  the  degree  of  improvement  will 
decrease  as  the  characteristics  of  the  service  centers  in  the 
queueing  network  approach  those  of  individual  M/G/1  servers; 
specifically,  as  the  arrival  process  at  each  service  center 
becomes  less  dependent  on  the  number  of  customers  already 
awaiting  service  there.  Experiments  with  large  multiprogramming 
levels  (greater  than  20)  indicate  that  the  potential  error,  and 
thus  the  potential  improvement  afforded  by  the  techniques 
presented  here,  is  nearly  as  great  as  for  the  UTCC  model.  The 
second  factor  affecting  the  utility  of  these  techniques  is  the 
ease  with  which  they  can  be  applied.  In  Chapter  5,  a  fully 
automatic  parameter  selection  methodology  for  Cox-type  servers  is 
discussed.  Given  the  locations  of  certain  percentiles  of  a 
service  time  distribution,  this  procedure  selects  parameters 
based  on  Laplace  transform  values.  It  can  be  directly  interfaced 
with  a  queueing  network  solution  package. 


Although  models  such  as  those  discussed  in  this  chapter  do 
not  satisfy  the  local  balance  assumptions,  computational 
requirements  still  can  be  kept  to  a  minimum  by  using  Norton's 
theorem  to  reduce  the  I/O  subsystem  to  a  single,  load-dependent 
server.  The  model  of  the  UTCC  system,  for  instance,  was  solved 
in  less  than  one  second  of  CPU  time.  The  techniques  described  in 
this  chapter  might  also  be  used  to  select  server  characteristics 
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for  Shum's  extended  product  form  approximation  method,  a  recently 
developed  alternative  to  Norton’s  theorem  for  obtaining  solutions 
to  queueing  networks  with  general  service  time  distributions 
[Shum  1977  ]. 

In  summary,  the  techniques  presented  in  this  chapter  are 
accurate,  robust  and  practical.  They  have  contributed 
substantially  to  our  understanding  of  the  behaviour  of  our  models 
and  to  our  confidence  in  their  predictive  abilities. 


Appendix:  The  Laplace  Transform  Result  for  the 

M/G/1  Queueing  System  with  Finite  Waiting  Room 

We  seek  an  expression  for  PO,  the  equilibrium  probability 
that  an  M/G/1  queueing  system  with  finite  waiting  room  is  idle. 
This  system  can  accomodate  N  customers,  including  the  one 
receiving  sprvice.  The  arrival  process  is  Poisson  with  a 
constant  rate.  An  arriving  customer  finding  the  system  full  is 
rejected  and  vanishes. 

In  this  system,  as  with  most  variants  of  the  M/G/1  system, 

Pj,  the  equilibrium  probability  that  there  are  j  customers  in  the 
system,  equals  PAj,  the  probability  that  an  arriving  customer 
finds  j  customers  already  in  the  system. 

In  an  M/G/1  system  with  infinite  waiting  room,  PAj  equals 
PDj,  the  probability  that  a  departing  customer  leaves  behind  j 
customers.  This  relationship  does  not  hold  in  the  M/G/1  system 
with  finite  waiting  room,  however,  since  customers  are  sometimes 
rejected.  For  this  latter  system,  PDj,  0<j<N-1,  equals  PAj  | 
j<N-1,  the  probability  that  an  arriving  customer  finds  j 
customers  already  in  the  system,  conditioned  on  the  fact  that 
this  arriving  customer  is  not  rejected. 

The  two  sets  of  probabilities  are  proportional,  and  the  PAj 
can  be  expressed  in  terras  of  the  PDj  by  considering  PAj  for  j 
equal  to  N,  the  probability  that  an  arriving  customer  is 
rejected.  (This  is  observed  by  Cohen  [1969],  among  others.) 

Once  this  probability  is  determined,  the  constant  of 
proportionality,  s,  can  be  computed  by  solving 
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N-1 

(1)  PA  +  s  2  PDj  =  1 

N  j=0 

The  probability  that  an  arriving  customer  is  rejected  is 
equal  to  (p  -  (1-P0))/p,  or  1  -  1/P  +  PO/p,  where  p  is  the  load 
factor  or  traffic  intensity  (the  mean  arrival  rate  divided  by  the 
mean  service  rate) .  Intuitively,  the  numerator  of  this 
expression  equals  the  difference  between  the  presented  load  ( p) 
and  the  accepted  load  (1-PO).  Since  PO  equals  sPDO,  (1)  becomes 


N-  1 

(2)  1  -  1/p  ♦  sPDO/p  ♦  s  2. 

j=0 

and  we  find 


1 


(3) 


g  —  N—  1 

PDO  ♦  p  2  PD j 
j=0 


1 


It  was  stated  earlier  that  PO,  the  equilibrium  probability 
that  an  H/G/1  system  with  finite  waiting  room  is  idle,  equals 
PAO,  which  in  turn  equals  sPDO.  So  from  (3),  we  have 


PDO 


(4) 


PO  =  N-1 

PDO  +  p  2  pPj 
j=0 


Cooper  [1972]  shows  that  the  PDj  are  proportional  to  the 
corresponding  quantities  for  the  M/G/1  system  with  infinite 
waiting  room.  The  generating  function  for  these  latter 
probabilities,  which  we  denote  by  P'j,  can  be  easily  obtained. 
The  constant  of  proportionality,  t,  can  be  found  by  solving 

N-1 

(5)  tjl0P'j  =  1 

So  from  (4)  and  (5),  we  have 


tP'  0 


(6)  PO  =  N-1 

tP»0  +  pt  2  P'j 
j=0 


P*  0 


N-  1 

P'0  t  p  ?  P'j 
j=0 
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But  P'O  equals  1-p,  so 


(7) 


(1-P) 

PO  =  N-1 

o-p)  ♦  p ip'j 
]=0 


Equation  (7)  expresses  the  server  utilization  of  an  M/G/1 
system  with  finite  waiting  room  in  terms  of  certain 
characteristics  of  the  equivalent  M/G/1  system  with  infinite 
waiting  room.  Specifically,  it  shows  that  server  utilization  in 
the  former  system  is  a  function  both  of  the  load  factor  and  of 
the  proportion  of  time  that  there  would  be  fewer  than  N  customers 
in  the  latter  system.  Knowing  this  last  quantity  is  equivalent 
to  knowing  the  full  equilibrium  distribution  of  the  number  of 
customers  in  the  M/G/1  system  with  infinite  waiting  room. 


It  is  obvious  that  the  P'j  are  dependent  on  certain 
characteristics  of  the  service  time  distribution.  In  fact,  to 
determine  the  P*j  we  must  fully  specify  the  service  time 
distribution,  since  the  Pollaczek-Khinchi n  mean  value  equation 
yields  exactly  M-1  moments  of  the  equilibrium  distribution,  given 
M  moments  of  the  service  time  distribution  [Kleinrock  1975]. 

Since  no  finite  number  of  moments  will  suffice,  nothing  short  of 
the  Laplace  transform  of  the  service  time  distribution  is 
adequate,  and  (7)  is  very  near  to  the  result  we  have  been 
seeking. 


Before  continuing  with  the  proof,  consider  (7)  in  a  more 
intuitive  light.  It  shows  that  for  a  fixed  load  factor,  i.e., 
for  a  fixed  mean  service  time  and  mean  interarrival  time  (or  I/O 
service  time,  in  the  equivalent  network  model) ,  a  range  of  server 
utilizations  may  be  attained.  This  range  depends  upon  three 
factors:  the  characteristics  of  the  service  time  distribution, 

the  load  factor,  and  the  capacity  of  the  system,  N  (the 
multiprogramming  level  in  the  equivalent  network  model) . 

As  the  load  factor  increases,  the  effect  of  the  service  time 
distribution  becomes  more  pronounced.  This  makes  sense;  the  CPU 
(in  the  equivalent  network  model)  is  becoming  the  bottleneck 
device.  The  role  played  by  the  multiprogramming  level  is  more 
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complex  to  analyze.  Clearly,  as  N  goes  to  infinity  the  effect  of 
the  service  tine  distribution  diminishes.  It  is  less  obvious 
(but  equally  correct)  that  the  effect  is  also  minimal  for 
extremely  small  values  of  N,  principally  because  the  P'O  term  of 
the  summation  is  determined  solely  by  the  load  factor,  which  is 
held  constant.  Neither  of  these  extreme  cases  is  of  practical 
interest . 

In  (7),  we  established  the  dependence  of  server  utilization 
upon  the  Laplace  transform  of  the  service  time  distribution.  To 
complete  the  proof  we  must  demonstrate  that  the  relationship  is 
an  inverse  one.  Me  must  show  that  for  arbitrary  but  fixed  load 
factor  and  N,  the  probability  that  an  H/G/1  queueing  system  with 
infinite  waiting  room  contains  fewer  than  N  customers  is  a 
decreasing  function  of  the  Laplace  transform  of  the  service  time 
distr ibut ion. 

He  denote  the  Laplace  transform  by  B*[s].  Me  say  that  the 
Laplace  transform  corresponding  to  a  particular  service  time 
distribution  is  greater  than  some  other  Laplace  transform  if  the 
value  of  the  first  transform  evaluated  at  some  point  s'  is  not 
less  than  the  value  of  the  second  transform  evaluated  at  that 
pcint ,  for  all  s'. 

Note  that  the  moments  of  a  distribution  can  be  recovered  from 
its  Laplace  transform  by  applying 


(8) 


where  the  parenthesized  superscript  denotes  differentiation. 

Thus  the  Laplace  transforms  of  all  service  time  distributions 
with  identical  means  will  have  the  same  slope  at  the  origin, 
although  their  values  there,  B*[0],  may  differ.  The  load  factor 
in  (7),  p,  can  be  expressed  in  terms  of  the  transform  by 


(9) 


p  =  -L{B*[0]}<1> 


where  L  is  the  arrival  rate. 


The  Pollaczek-Kh inchin  transform  equation  specifies  the 
generating  function  of  the  equilibrium  distribution  of  number  of 
customers  in  an  H/G/1  system  with  infinite  waiting  room  in  terms 
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of  the  Laplace  transform  of  the  service  time  distribution,  as 
follows  [ Kleinrock  1975]: 


(10) 


Q[  a  ] 


(a-1)  (1-p)  B*[  L  ( 1-a)  ] 

a  -  B*[  L  ( 1-  a)  ] 


The  probabilities  themselves  can  be  recovered  from  the  generating 
function  by  repeated  differentiation  according  to 


(11)  P'j 


C0[  0  ]} 


(j) 


jl 


Proceeding  in  this  manner,  we  obtain: 


(12) 

(13) 

(14) 


P'0  =  (1-p) 


P*1  =  (1-p) 


P'2 


(1-P) 


1  -  B*  [  L  ] 


1  +  L  { B*[  L  ]} 


(  a )  - 


B*[L] 


{B^[L]}2 


The  differentiations  become  increasingly  complex,  and  each 
successive  expression  involves  an  additional  derivative  of  the 
tr  ans  form . 


To  complete  the  argument,  we  rely  on  certain  properties  of 
the  equilibrium  distribution  of  number  of  customers  in  an  M/G/1 
system  with  infinite  waiting  room  (i. e. ,  the  probability  density 
function  of  the  P'j).  He  note  that  this  function  has  a  single 
maximum,  and  no  local  maxima  or  minima.  Thus  if  P'O  remains 
constant  and  PM  decreases,  for  example,  the  residual  probability 
will  be  distributed  among  the  remaining  P'j,  and  the  probability 
that  the  system  contains  N  or  fewer  customers  will  decrease. 

We  note  that  P'l  is  inversely  proportional  to  the  value  of 
the  Laplace  transform  evaluated  at  L.  For  P'2,  this  inverse 
relationship  is  more  pronounced  (because  of  the  squared  term  in 
the  denominator),  but  is  modulated  by  a  term  involving  the  first 
derivative  of  the  transform. 

From  (14)  and  the  discussion  following  it,  we  note  that  the 
moments  of  the  service  time  distribution  do,  in  fact,  play  an 
explicit  role.  They  are  L-moments  rather  than  central  moments. 
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however,  and  the  value  of  the  Laplace  transfora  evaluated  at  L  is 
the  dominant  term* 
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4.  The  Distribution  of  Response  Times  in 
Central  Server  Queueing  Networks 


Increasingly,  computer  systems  support  a  class  of  customers 
for  whoa  response  time  is  the  performance  measure  of  greatest 
interest.  For  a  variety  of  reasons,  the  mean  of  this  measure 
provides  significantly  less  information  than  do  the  means  of 
certain  other  measures,  such  as  device  utilization.  To  date, 
though,  no  analytic  method  has  been  devised  for  determining  the 
distribution  of  response  times  in  queueing  network  models. 


This  chapter  describes  such  a  method.  I  begin  by  deriving  an 
approximation  to  the  distribution  of  response  times  for  a 
customer  whose  number  of  cycles  through  the  queueing  network  is 
distributed  geometrically,  the  traditional  assumption  in  these 
models.  The  dependence  of  this  derivation  upon  the  unusual 
properties  of  the  geometric  distribution  leads  me  to  consider 
next  an  approximation  to  the  distribution  of  response  times  for  a 
customer  whose  number  of  cycles  is  known  exactly.  From  this 
latter  derivation  and  obvious  (if  rather  clumsy)  techniques,  the 
distribution  of  response  times  for  a  customer  with  an  arbitrary 
distribution  of  number  of  cycles  can  be  approximated.  In  the 
event  that  the  distribution  of  number  of  cycles  can  be  modelled 
using  Erlang’s  method  of  stages,  the  two  results  can  be  combined 
to  yield  an  especially  simple  approximation  for  the  distribution 
of  response  times. 


The  results  of  this  chapter  can  be  stated  quite  succinctly. 
In  a  queueing  network  model,  the  response  time  of  a  customer 
increases  linearly  with  the  total  amount  of  service  required  by 
that  customer.  This  service  requirement,  in  turn,  is  a  function 
of  the  mean  service  time  at  each  service  center,  the  transition 
probabilities,  and  the  distribution  of  number  of  cycles  through 
the  network.  Although  the  scheduling  disciplines  at  the  service 
centers,  the  presence  of  additional  customers  and  customer 
classes,  and  the  cha rac te rist ics  of  the  various  service  time 
distributions  other  than  their  means  each  play  a  role  in 
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deter  Dining  the  lean  response  time,  they  can  be  ignored  in 
approximating  the  distribution  of  response  times  from  this  mean 
and  the  distribution  of  number  of  cycles.  This  approximation 
becomes  more  accurate  as  the  expected  number  of  cycles  increases. 

The  results  are  obtained  through  straightforward  Laplace 
transform  analyses,  and  rely  upon  central  limit  theorems  that  can 
be  applied  because  of  the  regenerative  behaviour  of  the  models. 
The  approximations  display  excellent  accuracy  in  comparisons  with 
simulation  models  and  with  the  IBM  Time  Sharing  Option  at  the 
University  of  Toronto  Computer  Centre.  They  confirm  that 
aggregation  techniques  [Courtois  1977],  carefully  applied,  can 
lead  to  simple,  accurate  models  of  complex  computer  systems. 

The  Need  for  Distributional  Information 

The  equilibrium  state  distributions  provided  by  queueing 
network  models  are  well  suited  to  determining  the  mean  values  of 
performance  measures  such  as  throughput  and  device  utilization. 

To  specify  the  mean  utilization  of  a  service  center  conveys 
considerable  information.  A  processor  is  either  active  or  it  is 
not;  its  utilization  during  some  specific  period  of  time  cannot 
be  less  than  zero  percent,  nor  can  it  be  more  than  one  hundred 
percent. 

The  information  conveyed  by  the  mean  value  of  system  response 
time  is  considerably  less.  One  reason  is  that  the  maximum 
response  time  is  unbounded.  Another  is  that  variations  in 
response  times  have  adverse  psychological  effects  on  the  users  of 
interactive  systems,  as  illustrated  in  Chapter  1. 

Variations  in  response  times  arise  from  two  sources.  First, 
the  computational  requirements  of  a  particular  customer  may 
deviate  from  the  mean  requirements  of  the  customer  class  to  which 
it  belongs.  Second,  the  queue  length  encountered  on  a  particular 
visit  to  a  service  center  may  deviate  from  the  mean  queue  length 
there.  In  order  to  answer  questions  of  interest  to  the  users  of 
a  computer  system,  we  must  be  able  to  determine  at  least  the  mean 
response  time  conditioned  upon  required  service,  and  preferably 
the  distribution  of  this  measure. 
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Single  Queue  Analyses 


The  earliest  applications  of  queueing  theory  to  the  study  of 
computer  systems  relied  upon  classical  single  queue,  finite  or 
infinite  source  models.  In  recent  years,  though,  there  has  been 
a  strong  trend  towards  the  use  of  queueing  network  models. 
Queueing  networks  represent  the  structure  of  computer  systems 
with  considerable  realism,  and  can  be  evaluated  using  efficient, 
conceptually  simple,  iterative  techniques. 


Studies  of  the  distribution  of  response  times  in  computer 
systems  are  the  principal  exception  to  the  trend  towards  networks 
of  queues.  These  studies,  which  are  outgrowths  of  early  efforts 
to  evaluate  the  performance  of  various  scheduling  disciplines, 
have  continued  to  rely  upon  single  queue  models.  Typically,  the 
analysis  proceeds  along  the  following  lines: 


First,  using  knowledge  of  the  service  time  distribution  and 
the  scheduling  discipline,  the  Laplace  transform  of  the  response 
time  distribution  is  derived.  Next,  in  an  enviable  feat  of 
symbolic  gymnastics,  this  transform  is  inverted  by  inspection  to 
yield  the  cumulative  distribution  function  of  response  time. 
Finally,  since  the  formidable  expression  resulting  from  the 
inversion  of  the  Laplace  transform  contributes  little  to 
understanding  the  behaviour  of  the  system,  special  cases  or 
numerical  examples  are  considered  in  an  attempt  to  establish  an 
intuitive  basis  for  the  result. 


Outstanding  examples  of  this  genre  include  Coffman  et  al.  's 
use  of  an  infinite  source  M/M/1  queueing  system  to  evaluate  the 
processor  sharing  scheduling  discipline  [1970],  Muntz’s  use  of 
the  same  queueing  system  to  evaluate  the  round-robin  discipline 
[1972],  Sakata  et  al. 's  analysis  of  the  M/G/1  queue  under  round- 
robin  scheduling  [1971],  Sekino's  use  of  a  finite  source  M/M/m 
queue  to  model  the  performance  of  the  Multics  system  [1972],  and 
Lavenberg’s  response  time  analysis  of  the  M/G/1  system  with 
finite  waiting  room  [1975],  Kleinrock  [1976]  provides  an 
excellent  summary  of  these  and  related  results. 

Approximate  analyses  have  also  been  performed  for  certain 
single  queue  systems.  For  example,  in  the  process  of  deriving  a 
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hierarchical  approximation  to  a  closed  queueing  network  model, 
Avi-Itzhak  and  Heyman  [1973]  obtain  a  simple  approximate 
expression  for  the  response  time  distribution  of  an  fl/G/m  system, 
More  recently.  Song  has  applied  asymptotic  approximation 
techniques  to  two  single  queue  models:  a  finite  source,  single 
server  queueing  system  with  hyperexponentially  distributed 
service  times  and  foreground- back ground  scheduling  [Song  &  Muntz 

1976] ,  and  the  finite  source  H/M/m  model  studied  by  Sekino  [Song 

1977] ,  Wong’s  approach  yields  simple  expressions  for  the 
distribution  of  response  times  which  compare  favourably  to  exact 
analyses  when  the  number  of  customers  is  large.  His  results  for 
the  finite  source  M/M/m  model  have  an  especially  simple  form: 

Hong  demonstrates  that  the  response  time  distribution  is 
asymptotically  normal  as  the  number  of  customers  increases,  and 
derives  its  mean  and  variance  as  simple  explicit  functions  of  the 
model  parameters. 


o 


Figure  1 


Figure  2 


Figure  1  illustrates  the  finite  source  M/M/m  model  studied  by 
Sekinc  and  by  Wong.  The  model  represents  a  high  level  view  of  a 
time  sharing  system  in  which  N  customers  compete  for  admission  to 
a  multiprogramming  set  of  size  m.  The  servers  at  the  top  of  the 
figure  represent  the  N  interactive  terminals;  the  service  center 
representing  the  computer  system  has  one  server  for  each  of  the  m 
customers  in  the  multiprogramming  set.  The  response  time  of  a 
customer  includes  both  the  time  spent  awaiting  admission  to  the 
multiprogramming  set  and  the  time  spent  receiving  service. 
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The  accuracy  of  Sekino's  exact  analysis  depends  upon  two 
assumptions:  that  the  complex  activities  required  to  service  an 

interactive  request  can  be  represented  by  a  single  service  center 
which  is  visited  only  once  per  interaction,  and  that  the  delay 
encountered  in  receiving  service,  which  includes  multiple 
queueing  times  and  service  times  at  the  various  devices  in  the 
actual  system,  is  exponentially  distributed.  Wong’s  asymptotic 
analysis  imposes  two  additional  assumptions:  that  substantial 
queueing  for  admission  to  the  multiprogramming  set  occurs,  and 
that  there  is  a  sufficient  number  of  customers  that  the  rate  at 
which  requests  arrive  to  the  system  is  independent  of  the  number 
of  requests  pending.  (Given  these  assumptions,  the  normal 
distribution  of  response  times  follows  from  application  of  the 
Central  Limit  Theorem.  A  similar  technique  will  be  used  later  in 
this  chapter,  and  discussion  of  it  will  be  deferred  until  then.) 

These  assumptions  are  common  to  most  of  the  single  queue 
models  for  which  distributions  of  response  times  have  been 
obtained.  Representing  the  computer  system  as  a  network  of 
queues  rather  than  as  a  single  service  center,  as  illustrated  in 
Figure  2,  eliminates  these  potential  sources  of  error.  As  well, 
this  approach  allows  multiple  classes  of  customers  to  be 
represented.  For  all  of  these  reasons,  obtaining  an 
approximation  to  the  distribution  of  response  times  for  queueing 
network  models  invites  attention.  As  Figure  2  indicates,  the 
measure  of  response  time  considered  here  does  not  include  time 
which  may  be  spent  queueing  for  admission  to  the  multiprogramming 
mix.  In  many  systems,  this  delay  is  negligible;  its  inclusion  in 
hierarchical  models  is  discussed  both  in  this  chapter’s  summary 
and  in  Chapter  5. 

Networks  With  Geometrically  Distributed  Cycle  Requirements 

Consider  a  central  server  queueing  network  with  N  identical 
customers,  M  I/O  devices,  and  arbitrary  service  time 
distributions  and  scheduling  disciplines  throughout.  As  is 
conventional,  a  customer  has  a  fixed  probability  of  completing 
(and  being  instantaneously  replaced  by  a  statistically  identical 
customer)  at  the  conclusion  of  each  cycle  through  the  network;  in 
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other  words,  the  number  of  cycles  required  by  each  customer  is 
geometrically  distributed.  For  the  sake  of  future  simplicity,  a 
cycle  will  be  defined  to  end  after  service  at  the  I/O  subsystem 
rather  than  the  CPU. 

Let  Q*i[s]  denote  the  Laplace  transform  of  the  response  time 
distribution  for  a  single  visit  to  a  particular  service  center, 
i.  (The  question  of  how  to  determine  this  transform  will  be 
addressed  shortly.)  If  Pi  denotes  the  probability  that  a 
customer  proceeds  to  I/O  service  center  i  (1<i<M)  after 
completing  service  at  the  CPU,  then  the  Laplace  transform  of  the 
response  time  distribution  at  the  I/O  subsystem,  R*[s],  can  be 
expressed  as 

(1)  R*[s]  =  .1  Pi  Q*i[s] 

i=1 

Assuming  that  response  times  at  the  CPU  and  the  I/O  subsystem 
are  independent  of  one  another,  the  distribution  of  response 
times  for  one  cycle  through  the  network  is  equal  to  the 
convolution  of  the  CPU  and  the  I/O  subsystem  response  time 
distributions.  Its  Laplace  transform,  S*[s],  can  be  written  as 

(2)  S*[  s  ]  =  Q*0[s]  R*[s] 

where  Q*0[s]  is  the  Laplace  transform  of  the  response  time 
distribution  at  the  central  server.  (Hong  [1973]  has  derived  an 
exact  expression  for  S*[s]  when  all  service  time  distributions 
are  exponential.  His  work  remains  incomplete,  and  the  complexity 
of  the  derivation  for  this  restricted  case  encourages  us  to  seek 
an  approximate  result.) 

Let  C  be  the  probability  that  a  customer  completes  at  the  end 
of  a  cycle.  Then  the  probability  that  a  program  cycles  exactly  k 
times,  Zk,  is 

k-1 

(3)  Zk  =  C  (1-C)  1<k<  oo 

Assuming  that  the  response  times  encountered  on  successive  cycles 
are  independent  of  one  another,  the  response  time  distribution  of 
a  customer  requiring  exactly  k  cycles  is  equal  to  the  single 
cycle  response  time  distribution  convolved  with  itself  k-1  times. 
For  customers  whose  cycle  requirement  is  geometrically 
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distributed,  the  Laplace  transform  of  the  overall  distribution  of 
response  times  can  be  written  as 

(4)  T*[s]  =  I  Zk{S*[s]}k 

k=1 


«  k-1  k 

=  c  2  (i-c)  cs*[s]) 

k=1 


Since  S*[ s  ]  is  less  than  one,  this  expression,  the  sum  of  a 
geometric  series,  can  be  simplified  and  rewritten  as 


(5) 


T*[  s  ] 


C  S*[s] 

1  -  <1-C)S*[s] 


(Fcr  the  more  traditional  model,  in  which  customers  leave  the 
system  after  service  at  the  CPU,  S*[ s  ]  is  simply  replaced  by 
Q*0[s]  in  the  numerator  of  this  last  expression.) 


Careful  consideration  of  (5)  yields  insights  concerning  both 
the  nature  of  T  (the  overall  distribution  of  response  times)  and 
the  versatility  of  Laplace  transforms  as  an  analytic  tool. 

First,  let  us  make  the  assumption  that  the  response  times  for  a 
single  cycle  through  the  queueing  network  are  exponentially 
distributed.  It  is  easy  to  demonstrate  that  in  this  case,  the 
overall  distribution  of  response  times  is  also  exponential. 
Beginning  with  the  Laplace  transform  of  an  exponential 
distribution,  we  have 


S*[s] 


L 

s  ♦  L 


where  L  is  the  parameter  of  the  distribution  (the  inverse  of  the 
expected  duration  of  one  cycle).  Substituting  into  (5)  yields 


T*[  s  ] 


CL 

s  +  L 
( 1  -C)  L 
s+L 


CL 


s  ♦  CL 
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In  other  words,  response  times  are  exponentially  distributed  with 
a  parameter  of  CL,  or  a  mean  equal  to  the  expected  number  of 
cycles  multiplied  by  the  expected  duration  of  a  single  cycle, 
Moore  [1971]  and  Buzen  [1971]  both  make  this  observation. 

Should  response  times  for  a  single  cycle  be  constant  and 
identical,  an  analysis  similar  to  the  preceding  one  demonstrates 
that  the  resulting  overall  distribution  of  response  times  is 
geometric.  Once  again,  the  mean  response  time  is  equal  to  the 
expected  number  of  cycles  multiplied  by  the  time  required  to 
complete  one  cycle. 

Of  course,  few  systems  are  blessed  with  either  deterministic 
or  exponential  distributions  of  single  cycle  response  times. 
Proceeding  from  (5),  though,  it  is  possible  to  demonstrate  that 
regardless  of  the  form  of  S  (the  distribution  of  response  times 
for  a  single  cycle),  the  overall  response  time  distribution,  T, 
is  asymptotically  exponential. 

To  understand  why  this  is  the  case,  consider  the  response 
time  distribution  corresponding  to  some  specific  number  of 
cycles,  k.  After  each  of  the  k- 1  convolutions  required  by  (4), 
the  coefficient  of  variation  of  the  resulting  distribution  will 
be  decreased.  (Since  the  response  times  encountered  on 
successive  cycles  are  assumed  to  be  independent  of  one  another, 
each  convolution  increases  the  mean  of  the  resulting  distribution 
by  an  amount  equal  to  the  mean  of  the  single  cycle  response  time 
distribution,  and  the  variance  of  the  resulting  distribution  by 
an  amount  equal  to  the  variance  of  the  single  cycle  response  time 
distribution.  Thus  the  coefficient  of  variation  of  the  resulting 
distribution  decreases  with  increasing  numbers  of  convolutions.) 
For  large  values  of  k,  then,  the  sum  comprises  distributions  with 
very  low  coefficients  of  variation,  uniformly  spaced  means,  and 
weighting  coefficients  that  are  the  terms  of  a  geometric  series. 
Since  the  number  of  convolutions  required  to  reach  a  certain 
percentile  is  inversely  proportional  to  C,  this  exponential 
approximation  becomes  accurate  at  lower  percentiles  as  C 
decreases. 

More  formally,  the  moment  generating  property  of  the  Laplace 
transform  can  be  used  to  verify  that  as  C  decreases,  i.e.. 


as  the 
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expected  number  of  cycles  required  by  a  job  increases,  the 
moments  of  T  converge  to  the  loients  of  an  exponential 
distribution.  The  noaents  of  a  distribution  can  be  found  by 
repeated  differentiation  of  its  Laplace  transform  according  to 

E[tj]  =  (-1)  j{T*[0]}  (j) 

(The  parenthesized  superscript  denotes  differentiation.) 
Evaluating  the  negative  of  the  first  derivative  of  (5)  at  zero 
confirms  that  the  aean  of  T  is  equal  to  the  lean  of  S  nultiplied 
by  the  expected  number  of  cycles: 


-  {T*[0])<»> 


{S*[01)(t) 

c 


Differentiating  (5)  once  again  provides  the  second  aoment  of 
T.  The  squared  coefficient  of  variation  of  T  can  be  calculated 
by  substitution  into 


E[t«  ]  -  E2[t ] 
E2[t] 


When  this  substitution  is  performed,  the  following  expression  for 
the  squared  coefficient  of  variation  results: 


...  -  ts*[ 0 ]} <= »  -  (-ts*[0 ]}<»>)* 

(6)  C - ♦  1  -  C 

(-  (s*[  0  ]}  < 1 M  2 

The  first  tera  in  this  expression  is  equal  to  C  times  the  squared 
coefficient  of  variation  of  the  distribution  of  response  times 
for  a  single  cycle,  S. 

In  other  words,  as  C  decreases,  the  coefficient  of  variation 
of  T  varies  from  the  coefficient  of  variation  of  S  to  one  (which 
is  the  coefficient  of  variation  of  an  exponential  distribution). 
Convergence  to  one  is  linear  with  decreasing  C.  ( A  similar 
analysis  was  carried  out  by  Kienzle  [  1976  ]  in  studying  round- 
robin  scheduling  of  infinite  source,  single  server  queueing 
systems. ) 

An  analogous  exercise  can  be  carried  out  for  the  coefficient 
of  skewness.  This  measure,  a  normalized  moment  in  the  same  sense 
as  the  coefficient  of  variation,  is  equal  to 
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E[t3]  -  3E[  t  ]E[  t2  ]  +  2E3[  t  ] 

— 

(E[  t2  ]  -  E2[  t  ]) 

The  coefficient  of  skewness  of  an  exponential  distribution  is 
equal  to  two,  and,  as  C  decreases,  the  coefficient  of  skewness  of 
T  converges  to  this  value. 


In  summary,  subject  to  the  independence  assumptions  that  have 
been  stated,  the  cumulative  distribution  function  of  the  response 
time  distribution  for  a  customer  with  a  geometrically  distributed 
cycle  requirement  in  a  central  server  queueing  network  can  be 
approximated  by 


(7)  Pr  (t  <  x) 


-x  (C/E[  s ]) 

1  -  e 


where  E[  s  ]  is  the  expected  duration  of  a  single  cycle.  Using 
this  expression,  the  approximate  distribution  of  response  times 
can  be  determined  directly  from  data  provided  by  any  queueing 
network  solution  technique.  The  mean  of  S  can  be  expressed  as 
the  product  of  the  completion  probability,  C,  and  the  mean  of  T. 
In  turn,  the  mean  of  T  may  be  determined  from  the  throughput  rate 
of  the  queueing  network  model  and  the  mean  number  of  customers 
present.  This  last  statement  may  be  viewed  as  an  application  of 
Little's  formula,  Kleinrock's  conservation  law  [Kleinrock  1976], 
or  Buzen's  operational  analysis  [ Buzen  1976]:  if  the  equilibrium 
rate  at  which  requests  are  completed  is  H  and  the  equilibrium 
number  of  customers  either  receiving  or  awaiting  service  is  J, 
then  the  mean  response  time  is  J/H. 


A  number  of  comparisons  with  simulation  models  have  verified 
the  accuracy  of  this  technique.  Throughout  this  chapter,  a 
central  server  queueing  network  with  three  exponential  I/O 
service  centers  and  five  customers  will  be  used  as  an  example. 
Service  times  at  the  CPU  are  represented  by  a  balanced  (in  the 
sense  discussed  in  Chapter  3)  two-stage  hyperexponential  server 
with  mean  and  coefficient  of  variation  each  equal  to  9.  The 
completion  probability  is  .01.  The  other  relevant 
characteristics  of  the  model  are  shown  below: 
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other  words,  long  response  tines  sometimes  occur  when  short 
queues  are  encountered,  and  vice  versa. 

This  intuition  has  been  verified  by  examining  the  results  of 
numerical  convolutions  in  which  successive  queue  lengths  were 
correlated  with  one  another.  The  experiments  were  performed 
using  the  characteristics  of  the  example  system  described  above. 
The  degree  of  correlation  was  varied  from  one  extreme,  at  which  a 
customer  encountering  n  other  customers  at  the  CPU  also 
encountered  n  other  customers  at  the  I/O  subsystem,  through  the 
region  in  which  there  was  no  correlation,  to  the  opposite 
extreme,  at  which  a  customer  encountering  n  customers  at  the  CPU 
encountered  N-1-n  customers  at  the  I/O  subsystem.  (N,  as  before, 
denotes  the  total  number  of  customers  in  the  system.)  The 
distributions  of  response  times  for  these  various  cases  were 
within  ten  percent  of  one  another. 

The  third  approximation  involves  the  overall  exponential 
distribution  of  response  times.  As  one  would  expect  from  the 
earlier  discussion,  this  approximation  is  most  accurate  for 
smaller  completion  probabilities  (less  than  one  tenth)  and  larger 
percentiles  of  the  response  time  distribution  (greater  than  the 
twentieth).  Fortunately,  this  range  of  accuracy  encompasses  the 
cases  of  greatest  practical  interest. 

More  critical,  however,  is  an  assumption  implicit  in  all 
queueing  network  models:  that  the  number  of  cycles  required  by  a 
customer  is  geometrically  distributed.  Certain  studies  (e.g., 

[ Boyse  1971])  argue  that  this  is  a  reasonable  characterization  of 
the  actual  behaviour  of  computer  systems.  Nonetheless,  the 
preceding  derivation  relies  upon  the  unusual  properties  of  the 
geometric  distribution,  and  it  wculd  appear  that  more  valuable 
information  could  be  obtained  if  the  distribution  of  number  of 
cycles  were  restricted.  For  this  reason,  I  now  turn  my  attention 
to  a  different  problem:  determining  the  response  time 
distribution  for  a  customer  whose  number  of  cycles  is  known 
precisely. 
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A  Customer  Whose  Cycle  Requirement  Is  Known  Precisely 

In  the  previous  section,  the  unusual  properties  of  the 
geometric  distribution  of  number  of  cycles  through  a  queueing 
network  allowed  a  simple,  accurate  approximation  to  the 
distribution  of  response  times.  This  section  considers  the 
distribution  of  response  times  for  a  customer  requiring  a 
specific  number  of  cycles.  Although  the  approximate  form  of  this 
distribution  is  easily  obtained,  estimating  its  parameters  for  a 
specific  system  requires  a  rather  detailed  analysis  for  which 
several  restrictions  of  the  model  are  necessary. 

The  eventual  objective  of  this  chapter  is  an  approximation  to 
the  distribution  of  response  times  for  a  customer  with  an 
arbitrary  distribution  of  number  of  cycles.  The  detailed 
analysis  in  this  section  leads  the  way  to  a  simple  approximation 
for  this  general  case,  which  will  be  described  in  the  section 
that  follows.  Deriving  the  distribution  of  response  times  for  a 
customer  with  a  specific  cycle  requirement  is  worthwhile  for  this 
reason,  and  because  the  analysis  reveals  a  number  of  interesting 
properties  of  closed  queueing  networks,  as  well. 

The  fundamental  result  is  an  extremely  simple  one.  As 
before,  let  S*[s]  denote  the  Laplace  transform  of  a  customer's 
response  time  distribution  for  one  cycle  through  the  network. 

Then  the  overall  distribution  of  response  times  for  a  customer 
requiring  exactly  k  cycles  is  equal  to  the  single  cycle  response 
time  distribution  convolved  with  itself  k-1  times,  subject  to  the 
assumption  of  independence  between  successive  cycles.  T*£s],  the 
Laplace  transform  of  the  overall  response  time  distribution,  can 
be  written  as 

( 8)  T*[sJ  =  {S*[s]}k 

The  Central  Limit  Theorem  states  that  T,  the  distribution  of 
response  times,  will  be  asymptotically  normally  distributed  as  k 
increases,  with  a  mean  and  variance  that  can  be  calculated  from  k 
and  the  mean  and  variance  of  S,  the  single  cycle  response  time 
distribution.  From  this  information  and  the  cumulative 
distribution  function  of  the  standard  normal  distribution,  it  is 
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possible  to  calculate  the  cumulative  distribution  function  of  the 
response  tiae  distribution  by  using 


(9) 


Pr(t  <  x)  =  Pr(n  < 


x  -  kE[s] 


(ME[  s*  ]-E2[s])  )  »/* 


) 


This  equation  expresses  the  transformation  of  a  normally 
distributed  random  variable  to  the  standard  normal  distribution. 
On  the  right  hand  side  of  the  equality#  n  is  a  sample  from  the 
standard  normal  distribution#  kE[ s]  is  the  mean  response  time 
(expressed  in  terms  of  the  number  of  cycles  and  the  expected 
duration  of  a  single  cycle)  and  the  denominator  is  the  standard 
deviation  of  the  response  time  distribution. 


Once  again,  comparisons  with  simulation  models  have  been  used 
to  verify  the  accuracy  of  the  approximation.  If  the  single  cycle 
distribution  of  response  times  has  a  low  coefficient  of  variation 
(e.g.,  an  Erlang  distribution)#  then  accuracy  to  within  fifteen 
percent  can  typically  be  achieved  for  jobs  that  require  as  few  as 
five  cycles,  on  the  average.  For  other  single  cycle 
distributions  (e.g.,  hyperexponential)#  as  many  as  thirty  cycles 
may  be  necessary.  Returning  to  the  example  of  the  previous 
section,  the  following  table  and  graph  compare  the  response  time 
distribution  observed  via  simulation  to  the  response  time 
distribution  predicted  using  the  approach  just  described#  for  a 
customer  requiring  precisely  100  cycles  through  the  network: 
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Unfortunately#  the  problem  is  not  yet  solved.  In  the 
validation  experiments#  the  mean  and  variance  of  S  (the 
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distribution  of  response  times  for  a  single  cycle  through  the 
network)  were  obtained  directly  from  the  simulation.  In  the  last 
section,  a  method  for  obtaining  the  mean  of  S  from  data  provided 
by  any  gueueing  network  solution  technique  was  described. 

Onf or tunately ,  though,  the  variance  of  S  cannot  be  obtained  in  a 
straightforward  manner.  In  other  words,  although  (9)  appears  to 
be  a  robust  approximation,  it  is  not  obvious  how  to  determine  one 
of  its  components,  E[s2],  from  a  gueueing  network  model. 

The  remainder  of  this  section  is  devoted  to  deriving  the 
variance  of  S  in  terms  of  known  quantities.  The  general  approach 
involves  decomposing  S  into  its  constituent  distributions. 

•  The  Distribution  of  Response  Times  at  a  Service  Center 

Equation  (2)  indicates  that  S,  the  distribution  of  response 
times  for  a  single  cycle  through  the  queueing  network,  is  equal 
to  the  convolution  of  QO,  the  response  time  distribution  at  the 
central  server,  and  R,  the  response  time  distribution  at  the  I/O 
subsystem.  Given  the  assumption  of  response  time  independence 
between  the  central  server  and  the  I/O  subsystem,  the  variance  of 
S  is  equal  to  the  sum  of  the  variances  of  QO  and  R.  (Were  this 
independence  assumption  incorrect,  it  would  be  necessary  to 
include  a  term  equal  to  twice  the  covariance.) 

In  this  subsection,  expressions  are  obtained  for  the  mean  and 
variance  of  the  response  time  distribution  at  the  central  server. 
The  analysis  applies  equally  well  to  any  service  center,  and  in 
the  next  subsection  it  will  be  generalized  to  obtain  expressions 
for  the  mean  and  variance  of  the  response  time  distribution  of 
the  I/O  subsystem  as  a  whole. 

Q*i[s],  the  Laplace  transform  of  the  response  time 
distribution  at  an  arbitrary  service  center,  can  be  decomposed 
and  expressed  as 

N-1 

(10)  Q*i[s]  =  2  Ai  (n)Q*i|n[s] 

n  =  0 

where  Ai(n)  is  the  probability  that  a  customer  arriving  at 
service  center  i  finds  n  customers  present,  and  Q*i|n[s]  is  the 
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Laplace  transform  of  the  response  time  distribution  at  service 
center  i  conditioned  on  n  customers  being  present  at  arrival. 
Dropping  the  subscript  i  and  working  in  terms  of  the  distribution 
rather  than  its  transform,  we  can  show: 


(11) 

Variance  (Q) 

=  E[Q2  ]  -  E2[Q  ] 

(12) 

N- 1 

E[Q]  =  £  A(n)  E[Q  jn] 

n=0 

(13) 

N-1 

E[  Q2  ]  =  2  A(n)E[Q2|n] 

n=0 

(14) 

E[ Q2  1  n  ]  = 

Variance  (Q  J  n)  +  E2£Q|n] 

In  other  words,  given  the  arrival  instant  distribution  of 
number  of  customers  at  a  service  center  and  the  mean  and  variance 
of  the  response  time  distribution  at  that  service  center 
conditioned  on  the  number  of  customers  present  at  arrival,  the 
mean  and  variance  of  the  response  time  distribution  at  the 
service  center  can  be  calculated.  Shortly,  procedures  for 
estimating  each  of  these  quantities  will  be  described. 

•  The  Distribution  of  Response  Times  at  the  I/O  Subsystem 

Assume  that  the  mean  and  variance  of  the  response  time 
distribution  at  each  service  center  in  the  I/O  subsystem  have 
been  calculated,  using  M  separate  applications  of  the  method  just 
outlined  (where  H,  as  before,  is  the  number  of  I/O  service 
centers).  The  same  reasoning  that  led  to  (11)  -  (14)  can  be 

applied  once  again  to  yield  expressions  for  the  mean  and  variance 
of  R,  the  response  time  distribution  at  the  I/O  subsystem  as  a 
whole,  in  terms  of  the  means  and  variances  of  the  response  time 
distributions  at  the  individual  service  centers.  Equations  (11) 

-  (14)  are  rewritten  as  follows: 

(15)  Variance  (R)  =  E[  R*  ]  -  E*[  R  ] 

(16)  E[R]  »  j  Pi  E[Qi] 

1=1 

(17)  E[R*]  =  j  Pi  E[Qi*] 

1=  1 
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(18)  E[Qi2]  =  Variance  (Qi)  ♦  E2[Qi] 

The  only  coaponents  required  to  complete  the  approximation  to 
the  distribution  of  response  times  are  expressions  for  A,  the 
arrival  instant  distribution  of  number  of  customers  at  a  service 
center,  and  for  the  mean  and  variance  of  Q|n,  the  distribution  of 
response  times  at  a  service  center  conditioned  on  the  number  of 
customers  present  at  the  arrival  instant.  In  the  paragraphs  that 
follow,  each  of  these  issues  is  addressed. 


•  The  Arrival  Instant  Distribution  of  Customers 

First,  consider  the  problem  of  determining  A,  the 
distribution  of  number  of  customers  at  a  service  center  at  the 
instant  of  arrival.  While  B,  the  equilibrium  distribution  of 
number  of  customers  at  a  service  center,  is  provided  by  most 
queueing  network  solution  techniques,  it  is  important  to  reali2e 
that  A  and  B  are  not  equivalent.  That  this  is  the  case  is  made 
clear  by  comparing  A{N),  which  must  equal  zero,  to  B(N),  which 
need  not.  The  source  of  this  discrepancy  lies  in  a  theme  that 
has  recurred  frequently  throughout  this  work:  finite  and 
infinite  population  systems  have  decidedly  different 
characteristics. 

The  equilibrium  and  arrival  instant  distributions  are  often 
remarkably  dissimilar.  Figures  5  and  6  illustrate,  respectively, 
the  equilibrium  and  the  arrival  instant  distributions  of  number 
of  customers  at  the  CPU,  for  the  model  that  has  been  used  in 
previous  examples. 

The  time-weighted  average  illustrated  in  Figure  5  is  biased 
toward  the  extremes.  The  vast  majority  of  customers  will  be 
served  at  the  high-rate  stage  of  the  hyperexponential  CPU  server, 
keeping  the  queue  length  near  zero  for  substantial  periods.  On 
those  occasions  when  a  customer  is  served  at  the  low-rate  stage, 
it  will  remain  at  the  service  center  for  a  considerable  period  of 
time,  and,  in  short  order,  all  of  the  remaining  customers  in  the 
system  will  queue  behind  it. 
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Customers  Present  Customers  Present 

Figure  5  Figure  6 

The  arrival  instant  distribution  illustrated  in  Figure  6  does 
not  exhibit  this  characteristic,  since  the  number  of  arrivals 
that  can  occur  during  a  long  service  interval  is  bounded  by  the 
number  of  customers  in  the  system.  The  discussion  of  the 
distribution  of  residual  lifetimes  in  the  next  subsection  will 
contribute  further  to  our  understanding  of  the  distribution 
illustrated  in  Figure  6. 

Two  results  from  the  theory  of  single  server  queues  are 
relevant  to  the  problem  of  determining  A  when  B  is  known.  The 
first  concerns  the  P1/M/1  queueing  system  with  a  fixed  number  of 
customers,  N.  This  system  has  an  arrival  rate  that  is 
proportional  to  N-n,  where  n,  as  usual,  is  the  number  of 
customers  at  the  service  center.  Cooper  [1972]  shows  that  the 
arrival  instant  probabilities  of  encountering  various  numbers  of 
customers  in  this  system  egual  the  corresponding  equilibrium 
probabilities  for  the  same  system  with  N- 1  customers.  In  other 
word  s 


A  (n)  =  B  (n)  0<n<N-1 

N  N-1 

Unfortunately,  this  equation  exhibits  severe  errors  when  used  as 
an  approximation  in  systems  where  the  CPU  service  time 
distribution  is  not  exponential.  This  error  is  evident  from 
Figures  5  and  6;  the  arrival  instant  distribution  of  number  of 
customers  predicted  by  this  equation  would  have  approximately  the 
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same  form  as  the  equilibrium  distribution.  (The  result  is  cited 
here  principally  for  completeness. ) 

It  is  possible  to  obtain  an  estimate  for  A  which  is,  in 
practice,  quite  accurate.  This  is  accomplished  by  treating  the 
server  as  an  M/G/1  queueing  system  with  finite  waiting  room. 

This  system  also  can  accommodate  N  customers,  including  the  one 
receiving  service.  Its  arrival  rate  is  constant  and  independent 
of  n  unless  n=N,  in  which  case  it  is  zero.  Cooper  [1972]  shows 
that  the  arrival  instant  probabilities  for  this  system  equal  the 
corresponding  equilibrium  probabilities,  divided  by  one  minus  the 
equilibrium  probability  that  N  customers  are  present.  In  other 
words 

B  (n) 

A  (n)  = -  0<n<N-1 

1  -  B (N ) 

(Recall  that  the  M/G/1  system  with  finite  waiting  room  was  also 
of  use  in  discussing  the  characterization  of  service  time 
distributions  in  queueing  network  models.) 

The  accuracy  of  this  approximation  has,  once  again,  been 
verified  by  simulation. 

•  Response  Time  Conditioned  on  Customers  Present  at  Arrival 

The  previous  subsection  described  an  approximation  to  the 
arrival  instant  distribution  of  number  of  customers  at  a  service 
center.  Here,  a  complementary  approximation  is  described:  the 
calculation  of  the  mean  and  variance  of  Q|n,  i.e.,  the  mean  and 
variance  of  the  distribution  of  response  times  at  a  service 
center,  conditioned  on  the  number  of  customers  found  at  the 
service  center  by  an  arriving  customer. 

The  following  analysis  requires  that  the  scheduling 
discipline  at  each  service  center  be  first-come-f irst-ser ved. 

This  is,  in  general,  a  much  more  realistic  assumption  than 
processor  sharing,  the  discipline  most  frequently  encountered  in 
analytic  work.  Under  this  assumption,  the  response  time 
distribution  for  a  customer  encountering  n  other  customers  at  a 
service  center  is  the  convolution  of  n+1  independent 
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distributions.  The  arriving  custoaer,  as  well  as  any  customers 
already  in  the  gueue  but  not  yet  in  service,  will  require  a 
service  interval  with  mean  and  variance  equal  to  those  of  the 
service  time  distribution.  For  the  customer  presently  in  service 
(if  n>0) ,  the  mean  and  variance  of  the  residual  lifetime 
distribution,  i.e.,  the  distribution  of  the  remaining  service 
time  of  that  customer,  must  be  determined.  Once  this  has  been 
accomplished,  since  the  n+1  distributions  are  clearly  independent 
of  one  another,  their  means  and  variances  may  be  summed  to  yield 
the  mean  and  variance  of  Qjn.  Returning  to  the  transform  domain, 
if  F*[s]  is  the  Laplace  transform  of  the  service  time 
distribution  and  G*[s]  is  the  Laplace  transform  of  the  residual 
lifetime  distribution,  then  (once  again  ignoring  service  center 
subscripts) : 

(19)  Q*  |  n[  s  ]  =  F*[  s  ]  n=0 

=  G*[s](F*[s]}  0  n >0 

and,  substituting  into  (10) 

N-1  n 

(20)  Q*[  s]  =  A(0)F*[s]  +  G*[s]  2  A  (n)  (F*[s]} 

n=  1 

The  topic  of  residual  lifetime  properly  belongs  to  renewal 
theory,  for  which  Cox's  monograph  [1962]  is  an  excellent 
reference.  To  calculate  a  customer's  distribution  of  residual 
lifetime  clearly  requires  knowledge  of  the  distribution  of  the 
lifetime  itself,  i.e.,  the  customer's  service  time  distribution. 

It  is  important  to  observe  that  if  the  arrival  process  to  the 
service  center  is  Poisson  (and,  in  particular,  if  it  is 
independent  of  the  number  of  customers  already  at  the  service 
center)  then  more  customers  can  be  expected  to  arrive  during  an 
arbitrary  long  service  interval  than  during  an  arbitrary  short 
one.  In  other  words,  the  service  intervals  of  customers 
intercepted  by  random  arrivals  are  not  distributed  according  to 
the  overall  service  time  distribution. 

If  the  assumption  of  Poisson  arrivals  is  satisfied,  a  simple 
formula  exists  [ Kleinrock  1975]  that  expresses  the  kt h  moment  of 
the  residual  lifetime  distribution  in  terms  of  the  first  and 
k+lst  moments  of  the  service  time  distribution.  (The  formula 
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will  be  stated  and  applied  later  in  this  section.)  This  Poisson 
assumption  is  not  valid  in  a  closed  queueing  network,  however, 
and  the  formula  yields  large  errors  even  for  the  first  moment. 
Further,  the  dependencies  that  exist  are  sufficiently  complex 
that  deriving  an  adequate  formulation  is  quite  difficult. 
Informally,  extending  the  service  time  beyond  a  certain  point 
does  not  increase  the  expected  number  of  customers  that  will 
arrive  during  the  service  interval,  because  arrivals  have  ceased; 
the  entire  customer  population  is  already  in  the  queue. 

Should  the  service  time  distribution  at  a  service  center  be 
exponential,  the  memoryless  property  makes  the  calculation  of 
residual  lifetime  characteristics  simple  even  in  a  closed 
queueing  network:  its  mean  and  variance  are  equal  to  the  mean 
and  variance  of  the  service  time  distribution.  Equation  (20)  can 
be  simplified  to  read: 

N-1  n+1 

(21)  Q*[  s  ]  =  £  A(n)  ( F*[s]} 

n=0 

In  ether  words,  the  response  time  distribution  is  the  weighted 
sura  of  the  N  n-stage  Erlang  distributions,  1<n<N.  The  mean 
response  time  will  equal 

N-1 

(22)  E[Q]  =  E[F  ]  5  (n+ 1 )  A  (n ) 

n=0 

or  the  mean  service  time  multiplied  by  one  more  than  the  mean 
number  of  customers  at  the  service  center  at  the  instant  of 
arrival.  The  variance  of  response  times  can  be  calculated  using 
(11)  -  (14),  but  does  not  yield  special  insight;  in  particular, 
the  coefficient  of  variation  of  the  response  time  distribution  is 
not  limited  to  values  either  above  or  below  one. 

For  queueing  networks  composed  of  exponential  servers,  (22) 
completes  the  derivation  of  an  approximation  to  the  distribution 
of  response  times  for  a  customer  whose  cycle  requirement  is  known 
precisely.  In  the  presence  of  non-exponential  service  time 
distributions,  however,  determining  the  required  characteristics 
of  the  residual  lifetime  analytically  is  exceptionally  difficult. 
Fortunately,  extreme  accuracy  in  characterizing  the  residual 
lifetime  is  not  required  because  its  distribution  is  typically  a 
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sirall  component  of  the  response  time  distribution.  The  next 
section  describes  a  technique  for  estimating  the  mean  and 
variance  of  the  residual  lifetime. 


•  Estimating  the  Mean  and  Variance  of  Residual  Lifetime 

To  reiterate,  standard  techniques  for  calculating  the 
characteristics  of  the  distribution  of  residual  lifetimes  fail 
when  the  arrival  process  is  not  independent  of  the  state  of  the 
service  center.  These  techniques  maJce  the  assumption  that  the 
probability  of  an  arrival  occurring  during  a  service  interval  of 
length  f)  is  proportional  not  only  to  the  probability  of  the 
occurrence  of  intervals  of  that  length  (which  is  given  by  the 
probability  density  function  of  service  times),  but  also  to  $ 
itself . 


As  an  illustration,  consider  the  example  system  of  previous 
sections.  The  following  table  compares  the  means,  variances  and 
coefficients  of  variation  of  the  service  time  distribution,  the 
residual  lifetime  distribution  as  measured  via  simulation 
(referred  to  as  the  actual  residual  lifetime),  and  the  residual 
lifetime  distribution  as  calculated  assuming  Poisson  arrivals 
(referred  to  as  the  theoretical  residual  lifetime): 


d is  tr ibu  t ion 


coef f icien  t 

mean  variance  of _ variation 


service  time  9.  1 
actual  residual  lifetime  33.5 
theoretical  residual  lifetime  380.8 


6857 

45967 

580035 


9.  1 

6.4 

2.0 


The  effect  of  the  limited  customer  population  on  the  residual 
lifetime  distribution  is  striking. 

As  another  illustration,  consider  the  example  system  with  two 
alterations:  twenty  customers,  instead  of  five,  and  I/O  servers 

that  are  slower  by  a  factor  of  ten.  One  would  expect  these 
changes  to  make  the  arrival  process  less  dependent  on  the  number 
of  customers  at  the  CPU,  with  the  result  that  the  characteristics 
of  the  actual  residual  lifetime  more  closely  resemble  those  of 
the  theoretical  residual  lifetime.  In  fact,  the  actual  residual 
lifetime,  determined  via  simulation,  has  a  mean  of  240.4,  a 


-81- 


variance  of  279714,  and  a  coefficient  of  variation  of  2.2. 
Comparison  with  the  table  above  confirms  our  intuition. 

The  technique  that  follows  for  approximating  the 
characteristics  of  the  residual  lifetime  distribution  will 
require  that  the  CPU  be  the  only  service  center  in  the  queueing 
network  with  a  general  service  time  distribution.  This  is  by  no 
means  a  severe  restriction,  in  terms  of  the  complexity  of  models 
currently  in  use.  The  technique  relies  on  the  fact  that  if  the 
CPU  service  time  distribution  has  a  coefficient  of  variation 
greater  than  or  equal  to  one,  the  mean  and  variance  of  the  actual 
residual  lifetime  will  lie  between  the  means  and  variances, 
respectively,  of  the  service  time  and  theoretical  residual 
lifetime  distributions.  He  proceed  as  follows: 


n  Using  the  standard  equation 


(23) 


R 

j 


H 

j*1 


CM)", 


in  which  R  denotes  moments  of  the  residual  lifetime  distribution 
and  M  denotes  moments  of  the  service  time  distribution,  calculate 
the  mean  and  variance  of  the  theoretical  residual  lifetime  at  the 
CPU.  These  quantities  correspond  to  the  bottom  row  in  the 
preceding  table. 

n  From  the  mean  response  time  for  one  cycle  through  the  network 
(obtained  by  solving  the  queueing  network)  and  the  mean  response 
time  of  the  I/O  subsystem  (calculated  as  shown  in  a  previous 
section) ,  determine  the  mean  response  time  at  the  CPU. 

a  Consider  the  form  of  (20) .  It  indicates  that  given  the 
arrival  instant  distribution  of  customers  at  the  CPU  (A) ,  the 
mean  of  the  CPU  service  time  distribution  (E[F]),  and  the  mean 
response  time  at  the  CPU  (E[Q]),  the  actual  mean  residual 
lifetime  at  the  CPU  (E[G])  can  be  determined.  This  establishes  a 
set  of  three  values  such  as  appears  in  the  first  column  of  the 
preceding  table. 

a  Set  the  variance  of  the  actual  residual  lifetime  in  the  range 
between  the  variance  of  the  CPU  service  time  distribution  and  the 
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variance  calculated  using  (23).  (Once  again,  refer  to  the 
preceding  table.) 


As  a  rough  approximation,  the  variance  should  be  located 
proportionately  to  the  mean.  Although  this  linear  interpolation 
is  based  on  the  results  of  numerous  simulations,  it  lacks  a 
mathematical  basis.  Once  again,  I  emphasize  that  the  variance  of 
residual  lifetime  is  a  relatively  small  component  of  the  overall 
variance  of  response  times  at  a  service  center,  so  considerable 
error  can  he  tolerated. 

•  An  Example 

The  technique  just  described  completes  the  derivation  of  an 
approximation  to  the  response  time  distribution  for  a  customer 
with  a  specific  cycle  requirement  in  a  central  server  queueing 
network  where  the  CPU  has  a  general  service  time  distribution. 
Returning  cnce  again  to  our  canonical  example,  the  response  time 
distribution  that  follows  was  calculated  using  only  the  mean 
response  time  and  the  equilibrium  distributions  of  numbers  of 
customers  at  the  various  service  centers  (as  well  as  the  obvious 
information  concerning  service  time  distributions  and  selection 
probabilities) : 


%- ile  simulation  approx.  %  error 
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Figure 


Comparison  with  Figure  4 
have  been  necessary  in  d 
instant  distributions  of 


reveals  that  the  two  approximations  that 
eriving  E[s2]  analytically  (the  arrival 
numbers  of  customers  and  the  variance  of 
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the  residual  lifetime  distribution)  have  not  appreciably  affected 
the  overall  accuracy  of  the  technique. 

Our  ultimate  goal,  an  approximation  to  the  distribution  of 
response  times  for  a  customer  with  an  arbitrary  distribution  of 
number  of  cycles,  is  presented  in  the  next  section.  This 
approximation  is  an  application  of  the  techniques  developed  in 
this  section,  but  requires  neither  the  detailed  analysis  nor  the 
restrictive  assumptions  made  here. 

Arbitrary  Distributions  of  Number  of  Cycles 

Previous  sections  have  considered  central  server  queueing 
networks  under  two  different  assumptions  regarding  the  number  of 
cycles  through  the  network  required  by  a  customer.  In  the  event 
that  the  number  of  cycles  has  a  geometric  distribution,  the 
distribution  of  response  times  is  asymptotically  exponential  as 
the  expected  number  of  cycles  increases.  In  the  event  that  the 
number  of  cycles  is  known  precisely,  the  distribution  of  response 
times  is  approximately  normal. 

The  latter  approximation  would  seem  to  be  of  greater  direct 
benefit  than  the  former,  since  knowledge  of  the  distribution  of 
response  times  increases  in  value  as  the  service  requirements  of 
the  customer  are  specified  with  greater  precision.  The  two 
approximations  represent  extremes,  though,  and  behaviour  in 
between  should  be  considered. 

As  an  example,  one  might  imagine  a  system  with  a  bimodal 
distribution  of  required  cycles:  a  data  management  system  based 
upon  inverted  lists.  If  the  lists  required  for  a  particular 
query  exist,  the  request  can  be  satisfied  with  only  a  few  cycles 
of  computation  and  file  access.  Should  the  lists  not  exist, 
though,  many  cycles  will  be  required.  The  most  appropriate 
measure  of  system  response  time  would  include  both  possibilities, 
since  the  user  cannot  know  in  advance  into  which  class  a 
particular  query  will  fall. 


The  response  time  distribution  of  this  system,  then,  is  a 
composite  of  the  normally  distributed  response  times 
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ccrresponding  to  the  two  modes  of  required  service,  each  weighted 
by  the  probability  of  its  mode.  To  determine  the  cumulative 
distribution  function  of  response  times,  the  weighted  individual 
cumulative  distribution  functions,  each  of  which  has  the  form  of 
(9),  can  be  numerically  summed.  This  general  approach  is 
suitable  whenever  the  number  of  cycles  required  by  a  customer 
takes  on  a  small  number  of  values. 

Returning  once  again  to  the  transform  domain,  the  Laplace 
transform  of  the  composite  response  time  distribution  can  be 
expressed  as  the  weighted  sum  of  the  Laplace  transforms  of  the 
normal  response  time  distributions  corresponding  to  the  various 
numbers  of  cycles  required.  If  the  number  of  cycles  can  assume  Y 
values  and  the  probability  that  a  request  requires  each  of  these 
values  is  Pi,  1<i<Y,  then 


(24) 


T*[s]  = 


J  Pi  N*i[s] 

i=  1 


where  the  N*i[s]  are  the  Laplace  transforms  of  the  normal 
response  time  distributions  corresponding  to  the  various  numbers 
of  cycles  required. 

This  expression  is  an  important  one,  for  it  serves  to  bridge 
the  gap  between  the  two  extremes  that  have  been  discussed.  As 
the  distribution  of  number  of  cycles  becomes  more  complex  (i. e., 
as  it  comprises  more  values) ,  the  process  of  calculating  the 
composite  distribution  of  response  times  becomes  increasingly 
difficult.  However,  consider  the  extension  of  (24)  to  the  case 
in  which  the  distribution  of  number  of  cycles  is  geometric  (in 
which  case  the  composite  response  time  distribution  comprises  an 
infinite  number  of  normal  distributions): 


(25) 


®  k- 1  kMs+k  Vs* /2 

T*[  s  ]  =  C  2.  (1-C)  e 

k=  1 


oo  k-1  Ms+V s2/2  k 

=  ck|10-0  (e  } 


where  c  is  the  probability  that  a  customer  completes  at  the  end 
of  a  cycle,  and  the  exponential  function  is  the  Laplace  transform 
of  the  normal  distribution  approximating  the  response  times  of  a 
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customer  requiring  exactly  k  cycles.  (M  and  V  are  the  mean  and 
variance  of  the  normal  distribution  approximating  the  response 
times  of  a  customer  requiring  a  single  cycle.) 

This  expression  has  the  form  of  (4),  where  S*[s]  is  replaced 
by  the  Laplace  transform  of  a  normal  distribution.  The  same 
argument  used  in  that  section  can  be  applied  here  to  demonstrate 
that  the  overall  distribution  of  response  times  is  asymptotically 
exponential,  with  a  cumulative  distribution  function  approximated 
by  (7) . 

Of  course,  this  fact  is  reassuring  rather  than  surprising, 
but  it  has  a  significant  implication:  if  the  distribution  of 
number  of  cycles  can  be  modelled  as  a  composition  of  geometric 
distributions  using  a  discrete  equivalent  of  the  method  of  stages 
(for  example,  if  the  distribution  is  hyper geometr ic,  a  Bernoulli 
selection  from  among  geometric  distributions),  then  the  overall 
distribution  of  response  times  can  be  approximated  as  a 
composition  of  the  exponential  response  time  distributions  that 
correspond  to  the  various  geometric  stages  of  the  distribution  of 
number  of  cycles.  (An  example  will  be  given  in  the  next 
section. ) 

This,  in  turn,  leads  to  the  following  interpretation  of  the 
results:  As  the  expected  number  of  cycles  through  the  network 

required  by  a  customer  increases,  the  distribution  of  response 
times  depends  less  heavily  on  the  response  time  distributions  for 
the  individual  service  centers  and  for  a  single  cycle  through  the 
network.  Instead,  the  distribution  of  response  times 
asymptotically  resembles  the  distribution  of  the  number  of  cycles 
required  by  the  customer  (although  the  means  of  these 
distributions  differ).  If  this  latter  distribution  can  be 
represented  using  a  discrete  equivalent  of  the  method  of  stages, 
then  the  distribution  of  response  times  can  be  obtained  by 
substituting  exponential  stages  into  the  same  Cox-type  structure, 
each  stage  approximating  the  distribution  of  response  times  for 
the  corresponding  geometric  stage  in  the  distribution  of  number 
of  cycles.  If,  on  the  other  hand,  the  distribution  of  number  of 
cycles  consists  of  several  modes,  then  the  distribution  of 
response  times  can  be  obtained  by  combining  the  normal 
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distributions  approximating  the  distribution  of  response  times 
corresponding  to  each  mode.  Note  that  the  detailed  analysis 
described  in  the  previous  section  is  necessary  only  in  the  letter 
case;  if  the  exponential  approximation  can  be  used,  mean  response 
time  is  the  only  information  required. 

The  examples  of  previous  sections  demonstrate  that  these 
approximations  yield  reasonable  accuracy  for  relatively  small 
numbers  of  cycles.  More  convincing,  though,  is  (6),  which  shows 
the  rate  at  which  the  coefficient  of  variation  of  the  response 
time  distribution  approaches  the  coefficient  of  variation  of  an 
exponential  distribution,  for  a  customer  with  a  geometric 
distribution  of  required  cycles.  For  example,  if  S,  the  single 
cycle  response  time  distribution,  has  a  coefficient  of  variation 
of  5,  and  the  expected  number  of  cycles  through  the  network  is 
100,  then  the  coefficient  of  variation  of  T,  the  overall 
distribution  of  response  times,  will  be  1.1.  The  next  section 
contains  two  realistic  examples  that  further  illustrate  the 
utility  of  these  techniques. 

Two  Examples 

As  an  initial  example,  I  consider  the  distribution  of 
response  times  for  the  IBM  Time  Sharing  Option  as  implemented  at 
the  University  of  Toronto  Computer  Centre.  Qualitatively,  TSO 
response  time  has  been  aptly  characterized  by  Kernighan  [  1975]: 
•'Using  TSO  is  like  kicking  a  dead  whale  along  the  beach.”  In  a 
mere  quantitative  vein,  the  mean  response  times  of  TSO  systems 
have  been  modelled  analytically  by  lassettre  and  Scherr  [1972] 
and  by  Unruh  and  Sevcik  [1975],  among  others. 
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significant  one:  to  show  that  the  assumptions  and  approximations 
that  have  been  made  are  compatible  with  the  characteristics  of  an 
actual  system. 


The  thrust  of  the  results  in  this  chapter  is  that  in  a 
central  server  queueing  network,  the  approximate  distribution  of 
response  times  asymptotically  has  the  same  form  as  the 
distribution  of  number  of  cycles  through  the  network,  with  a  mean 
equal  to  the  expected  number  of  cycles  multiplied  by  the  expected 
duration  of  a  single  cycle.  Based  upon  measurement  data,  the 
nTCC  TSO  system  exhibits  this  property.  Therefore,  given  an 
accurate  analytic  model,  the  techniques  developed  in  this  chapter 
will  correctly  predict  its  response  time  distribution. 


This  assertion  is  based  upon  data  collected  during  several 
one-hour  periods  of  system  operation.  The  behaviour  of  the 
system  was  similar  during  each  of  these  intervals,  and  only  one 
of  them  is  considered  here.  During  this  period,  an  average  of  8 
users  were  logged  on.  Concurrently,  the  average  batch 
multiprogramming  level  was  7.  CPO  utilization  was  92%. 

The  probability  density  function  of  the  number  of  cycles 
required  by  an  interaction  has  the  form  illustrated  in  Figure  8. 
(A  cycle  corresponds  to  the  passage  of  the  user  process  through 
the  ^IVT  dispatcher.  The  distribution  is  discrete  rather  than 
continuous,  but  the  distinction  is  unimportant  for  present 
purposes.)  The  mean  number  of  cycles  required  is  53. 


Figure  8 


Figure  9 
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In  Chapter  3  it  w 
could  be  modelled  acc 
in  Figure  9.  A  paraa 
matching  the  Laplace 
also  described.  When 
distribution  illustra 
are  selected: 
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P:  .200 

T 1 :  .119 

T2 :  10.7 

T3 :  .103 


Observed  Response 
Time  Distribution 


he  parameter  set  that  is  chosen  when  the 
ethodology  described  in  Chapter  3  is  used  to 
nsform  of  the  observed  response  time 
ystem.  This  distribution,  obtained  from 
a  mean  of  2.33  seconds.  The  parameters  are 
ose  resulting  from  the  approximation 


Figure  10  compares  the  percentiles  of  the  observed  and 
predicted  distributions  of  response  times.  Their  similarity 
supports  the  utility  of  the  approximation  techniques  developed 
h  e  re . 
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The  second  example  illustrates  the  approximat ion  of  the 
response  time  distribution  for  a  customer  with  a  bimodal 
distribution  of  required  cycles,  such  as  might  arise  in  the  data 
management  system  described  in  the  previous  section. 


The  hypothetical  computer  system  consists  of  a  CPU  and  three 
exponential  I/O  devices.  Service  times  at  the  CPU  are 
represented  by  a  balanced  two-stage  hyperexponential  server  with 
a  mean  of  15  and  a  coefficient  of  variation  of  5.  There  are  30 
interactive  customers;  their  think  times  are  exponentially 
distributed  with  means  of  4000.  The  other  relevant 
characteristics  of  the  model  are  shown  below: 

service  selection 


center 

mean  proba bilit  y 

ut ilizat ion 

CPU 

15 

84% 

1 

15  .  3 

25% 

2 

10  .5 

28% 

3 

40  .  2 

45% 

The  response  time 

distribution  will  be  determined  for 

interactions  that 

require  5  cycles  through 

the  networ 

probability  of  .75,  and  25  cycles  with  a  probability  of  .25. 

The  overall  distribution  of  response  times  can  be 
approximated  as  a  weighted  sum  of  two  normal  distributions,  one 
corresponding  to  each  mode  of  required  service.  The  mean  and 
variance  of  each  of  these  constituent  distributions  can  be 
determined  from  the  mean  and  variance  of  the  single  cycle 
response  time  distribution.  The  expected  duration  of  a  single 
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cycle  can  be  obtained  directly  from  most  queueing  network 
solution  techniques,  but  since  the  CPU  service  time  distribution 
is  not  exponential,  the  variance  must  be  estimated  using  the 
char a c ter ist ics  of  the  various  service  time  distributions  and  the 
arrival  instant  distributions  of  numbers  of  customers  at  each 
service  center.  These  arrival  instant  distributions  must,  in 
turn,  be  estimated  from  the  corresponding  equilibrium 
d istr ibutions. 


When  the  techniques  described  in  previous  sections  are 
applied  to  this  model,  the  mean  and  variance  of  the  sinqle  cycle 
response  time  distribution  are  calculated  to  be  157  and  60485, 
respectively.  The  normal  approximations  to  the  response  time 
distributions  for  5  and  25  cycles  are  shown  below: 
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These  two  cumulative  distribution  functions  are  weighted  by 
their  corresponding  probabilities  and  summed  to  yield  the  overall 
distribution  of  response  times.  Figure  11  compares  the  predicted 
response  time  distribution  with  the  response  time  distribution 
obtained  via  simulation: 


-ile 

simulation 

approx . 

%  error 

10 

200 

150 

-25 

20 

300 

450 

♦  50 

3  0 

500 

650 

♦3  0 

40 

7  0  C 

800 

♦  14 

50 

1000 

1000 

0 

60 

1350 

1250 

-  7 

70 

1850 

1750 

-  5 

80 

2600 

2900 

♦  1  1 

90 

4100 

4250 

♦  4 

Figure  11 


3000 


As  in  previous  examples,  the  approximation  technique  provides 
good  accuracy. 


Summary 

The  traditional  emphasis  in  queueing  network  modelling  has 
been  on  system-oriented  performance  measures  such  as  device 
utilization.  The  few  existing  studies  of  user-oriented 
performance  measures  (e.g.,  response  times)  in  these  models 
consider  only  mean  values,  which  provide  an  insufficient 
characterization  of  system  performance. 

In  this  chapter,  I  have  derived  an  approximation  to  the 
distribution  cf  response  times  for  a  customer  in  a  central  server 
queueing  network  model  with  f irst-come-f irs t-served  scheduling 
and  a  general  CPU  service  time  distribution,  for  an  arbitrary 
distribution  of  the  number  of  cycles  through  the  network  required 
by  the  customer.  The  results  can  be  summarized  quite  simply.  As 
the  expected  number  of  cycles  required  by  a  customer  increases, 
the  distribution  of  response  times  assumes  the  form  of  the 
distribution  of  number  of  cycles,  regardless  of  the  distributions 
of  service  times  at  the  various  service  centers  in  the  network. 

In  other  words,  the  response  time  of  a  customer  increases 
linearly  with  the  total  amount  of  service  required  by  that 
customer,  which  can  be  determined  from  the  mean  service  times  at 
the  various  service  centers.  The  results  generalize  to  arbitrary 
networks  and  to  networks  with  several  customer  classes. 

These  results  are  similar  in  flavour  to  those  obtained  for 
infinite  source,  single  server  models  in  which  the  processor 
sharing  scheduling  discipline  is  employed  (cf.  [Coffman  et  al. 
1970]  and  [Sakata  et  al.  1971]).  This  leads  us  to  conclude  that 
a  single  service  center  can  accurately  represent  the  response 
time  behaviour  of  a  complex  computer  system  in  a  high-level 
model,  provided  that  a  detailed  gueueing  network  model  has  been 
used  tc  determine  the  appropriate  characteristics  for  the  server. 
It  lends  strong  support  to  Courtois'  theories  regarding  our 
ability  to  aggregate  subsystems  in  hierarchical  ("nearly 
completely  decomposable")  systems  [1977], 
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Two  different  approaches  have  been  taken  in  validating  these 
approximations.  The  first,  comparison  with  simulation  models,  is 
subject  to  the  criticism  that  the  simulation  model  and  the 
analytic  model  have  the  same  underlying  structure  (e.q.,  service 
times  are  independent,  identically  distributed  random  variables, 
and  I/O  service  centers  are  selected  by  Bernoulli  trials).  Hhile 
this  approach  provides  a  fair  test  of  the  approximation 
techniques  themselves,  it  does  not  argue  their  applicability  to 
actual  computer  systems.  The  second  approach  to  validation  does 
net  share  this  deficiency:  I  have  demonstrated  that  the 
behaviour  of  the  TSO  system  at  the  University  of  Toronto  Computer 
Centre  is  consistent  with  the  predictions  of  the  approximation 
techniques  that  I  have  developed. 

Despite  the  apparent  accuracy  of  these  approximations, 
caution  must  be  exercised  in  applying  them.  Perhaps  the  greatest 
potential  source  of  error  lies  in  the  fact  that  transients,  which 
can  cause  unusually  long  delays  in  actual  computer  systems,  are 
ignored.  The  transient  behaviour  of  queueing  systems  has 
received  scant  attention.  Recent  work  by  DeSantis  [1976], 
Grassmann  [1977],  and  Neuts  [1977]  shows  promise  in  this  respect, 
but  thus  far  the  distribution  of  response  times  has  been  obtained 
only  for  extremely  simple  models.  Iglehart  and  Shedler  [1977] 
have  applied  regenerative  simulation  to  this  problem  in  an 
analogous  manner.  These  approaches  will  be  discussed  in  the  next 
c  hapter. 
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5.  Implications  and  Directions  for  Future  Research 


The  final  sections  of  Chapters  3  and  4  have  summarized  the 
specific  contributions  of  this  thesis  and  discussed  their 
application.  To  briefly  reiterate: 
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□  In  Chapter  4,  an  approximation  to  the  distribution  of 
response  times  in  queueing  network  models  is  derived.  In  most 
cases  the  technique  is  extremely  simple  to  apply,  since  once  a 
detailed  model  has  been  used  to  determine  mean  response  time,  the 
distribution  of  response  times  can  be  approximated  on  the  basis 
of  hiqher-level  characteristics.  The  approximation  exhibits 
excellent  accuracy  in  comparisons  with  simulation  models  and  an 
existing  system. 

By  enhancing  the  ability  of  queueing  network  models  to  deal  with 
distributional  information,  these  contributions  increase  both  the 
accuracy  and  the  utility  of  these  models. 

The  present  chapter  is  more  speculative  in  nature,  and  has 
two  objectives.  I  will  first  attempt  to  place  the  thesis  as  a 
whole  in  perspective,  and  will  discuss  some  general  implications 
of  the  work.  Then,  I  will  outline  several  directions  for  future 
work  suggested  by  the  thesis. 
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Reflections 

In  order  to  place  the  preceding  chapters  in  perspective,  I 
must  briefly  confront  a  topic  assiduously  avoided  thus  far: 
computer  system  modelling  in  the  broad  sense.  A  model  represents 
a  theory  of  the  way  in  which  the  components  of  a  system  interact. 
This  thesis  makes  certain  contributions  to  that  general  theory, 
as  well  as  to  the  modeller’s  bag  of  technical  tricks. 

Properly  conceived,  a  model  reflects  exactly  those  aspects  of 
a  system  that  have  a  bearing  on  the  performance  measures  of 
interest.  As  a  consequence,  these  performance  measures,  along 
with  the  modeller’s  perception  of  system  interactions,  are  the 
primary  determinants  of  the  structure  and  the  level  of  detail  of 
the  model.  To  the  extent  that  a  particular  modelling  technique 
(e.g.,  simulation  or  analytic  modelling)  intrudes  on  this 
process,  the  technique  is  inappropriate. 

The  fact  that  networks  of  queues  can  realistically  model  the 
equilibrium  behaviour  of  computer  systems  in  specific 
environments  is  not  in  itself  evidence  that  they  provide  a 
suitable  framework  for  studying  these  systems.  There  are, 
however,  more  compelling  arguments  to  support  their 
appropriateness.  A  substantial  number  of  recently  documented 
case  studies  demonstrate  that  queueing  network  models  can 
accurately  predict  the  effects  of  configuration  and  load  changes. 
As  well,  insights  concerning  the  behaviour  of  queueing  network 
models  often  prove  to  be  directly  applicable  to  computer  systems. 
Observations  such  as  these  indicate  that  the  theory  of  computer 
system  interactions  embodied  in  queueing  network  models  captures 
many  essential  aspects  of  these  systems. 

One  of  several  bonds  between  the  study  of  service  time 
distributions  presented  in  Chapter  3  and  the  study  of  response 
time  distributions  presented  in  Chapter  4  is  the  observation  that 
the  tails  of  service  time  distributions  are  of  minimal  importance 
in  closed  queueing  networks.  This  is  an  example  of  an  insight 
that  applies  not  only  to  these  models,  but  to  computer  systems  as 
well.  It  gives  us  reason  to  revise  our  thinking  concerning  those 
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aspects  of  computer  systems  that  should  be  reflected  in  our 
models. 

Many  of  the  phenomena  encountered  in  the  course  of  this  work 
are  closely  related  to  this  observation.  In  Chapter  3 ,  the 
'•limited  damage”  argument  motivated  the  fact  that  describing  CPU 
service  time  distributions  in  terms  of  their  Laplace  transforms 
(which  emphasize  the  forms  of  distributions  near  the  origin) 
provides  a  superior  characterization  than  does  describing  these 
distributions  in  terms  of  their  moments  (which  tend  to  emphasize 
the  tails  of  distributions) .  In  Chapter  4,  the  minimal 
importance  of  the  tails  of  distributions  was  related  to  two  other 
observations:  that  mean  residual  lifetimes  in  closed  queueing 

networks  are  substantially  less  than  one  would  expect,  and  that 
the  equilibrium  and  arrival  instant  distributions  of  the  number 
of  customers  at  a  service  center  can  be  extremely  dissimilar. 

The  understanding  that  we  have  achieved  of  the  origins  of 
these  phenomena  argue  that  the  desirability  of  describing  service 
time  distributions  in  terms  of  certain  values  of  their  Laplace 
transforms  is  much  more  general  than  has  been  indicated.  It 
seems  safe  to  conjecture  that  this  approach  provides  a  superior 
characterization  at  all  service  centers  in  the  most  general 
network  topologies.  (I  also  suspect  that  an  analytic  proof  of 
the  Laplace  transform  result  throughout  the  central  server 
queueing  network  spectrum  must  exist.)  This  enhances  the  value 
of  the  parameter  selection  methodology  described  in  Chapter  3. 
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interactions  can  be  studied  without  reference  to  inter-group 
interactions,  and  vice  versa.  (Note  that  Norton's  theorem, 
described  in  Chapter  2,  is  a  specific  application  of  this 
theory.)  He  then  discusses  two  theorems  (first  stated  and  proved 
in  [Simon  5  Ando  1961])  specifying  the  conditions  under  which  the 
aggregation  approach  yields  an  acceptable  approximation  in 
systems  whose  interactions  cannot  be  completely  decomposed  into 
such  groups. 

Courtois*  application  of  this  theory  to  computer  systems  is 
based  on  the  proposition  that  in  hierarchically  organized 
systems,  the  interactions  at  a  particular  level  are  much  more 
frequent  than  the  interactions  between  various  levels.  Such 
systems  are  "nearly  completely  decomposable"  in  the  terminology 
of  Simon  and  Ando,  and  aggregation  techniques  should  yield 
reasonable  accuracy. 

Chapter  4  showed  that  the  distribution  of  response  times  for 
a  customer  in  a  queueing  network  model  has  approximately  the  same 
form  as  the  distribution  of  the  number  of  cycles  throuqh  the 
network  required  by  that  customer.  The  exceedingly  simple  nature 
of  this  result  is  due  to  the  fact  that  certain  low-level  aspects 
of  system  behaviour  are  aggregated;  the  accuracy  of  the  technique 
verifies  that  aggregation  is  appropriate  in  this  environment. 

Let  us  examine  this  aggregation  process  in  more  detail. 

The  results  in  Chapter  3  demonstrate  that  service  time 
distributions  at  the  various  service  centers  in  the  network  must 
be  characterized  more  accurately  than  by  their  means  alone,  in 
order  to  determine  mean  performance  measures  such  as  device 
utilizations,  queue  lengths,  and  throughput.  These  performance 
measures,  in  conjunction  with  Little's  result,  allow  us  to 
determine  the  expected  duration  of  a  single  cycle  through  the 
network . 

Once  this  has  been  done,  two  factors  make  it  possible  to 
ignore  the  detailed  interactions  that  occur  during  a  single  cycle 
in  approximating  the  distribution  of  response  times.  Each  is  a 
consequence  of  the  fact  that  customers  cycle  through  the  network 
a  number  of  times  in  obtaining  their  required  service,  and  each 
becomes  more  pronounced  as  the  expected  number  of  cycles 
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increases.  First,  the  distribution  of  the  total  service  required 
by  a  custoaer  becomes  governed  by  the  distribution  of  the  number 
of  cycles  through  the  network  and  the  mean  service  times  at  the 
various  service  centers.  The  importance  of  additional 
information  concerning  the  distributions  of  service  times 
diminishes.  Second,  the  service  disciplines  at  the  various 
service  centers  become  less  important.  Processor  sharing  becomes 
a  reasonable  approximation,  since  gueueing  occurs  frequently  and 
customers  interfere  randomly  with  one  another. 

The  assumptions  on  which  this  aggregation  rests  are  made 
explicit  by  the  development  in  Chapter  4.  The  demonstrated 
accuracy  of  the  approximation,  even  for  relatively  small  expected 
numbers  of  cycles  through  the  network,  provides  evidence  that 
aggregation,  carefully  employed,  provides  a  powerful  tool  for 
analyzing  the  performance  of  complex,  hierarchically  structured 
computer  systems. 

Ex  tensions 

The  work  of  this  thesis  can  be  extended  in  a  number  of 
directions  to  increase  the  utility  of  the  results.  Several  of 
the  more  interesting  topics  are  described  in  the  paragraphs  that 
follow . 

•  A  Fully  Automatic  Parameter  Selection  Methodology 

Chapter  2  noted  that  the  method  of  stages  has  not  been  fully 
exploited  largely  because  of  the  absence  of  a  specific,  well- 
motivated  parameter  selection  methodology.  The  work  of  Fux  and 
Herzog  described  in  Chapter  3  is  a  major  step  forward,  but 
suffers  from  several  shortcomings.  First,  its  cost,  which  is 
potentially  substantial,  is  not  assessed.  Second,  no  guidance  is 
provided  concerning  which  percentiles  of  the  observed 
distribution  should  be  matched.  Finally,  while  the  tolerances 
chosen  can  have  a  dramatic  effect  on  the  number  of  stages 
required,  their  effect  on  the  accuracy  of  the  model  in  which  the 
server  is  used  is  not  evaluated. 
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The  algorithm  described  in  Chapter  3  for  directly  matchinq 
the  Laplace  transform  of  an  observed  distribution  does  not  share 
these  deficiencies.  Further,  while  the  three  stage  Cox-type 
server  used  there  may  well  be  sufficiently  general  for  most 
applications,  the  algorithm  can  be  extended  to  the  general  Cox 
server  illustrated  in  Figure  3  of  Chapter  2,  reproduced  below: 


Figure  1 

For  present  purposes,  PI  (which  specifies  the  proportion  of 
customers  that  will  have  service  times  equal  to  zero)  can  be 
taken  to  be  zero.  The  probability  of  leaving  the  server  after 
completing  service  at  stage  R  is  implicitly  one.  The  Laplace 
transform  of  the  service  time  distribution  represented  by  this 
server  is 


C*[s] 


(1-P  j)/T  j 
s  +  1/Tj 


The  algorithm  described  in  Chapter  3  can  be  applied  to  this 
server,  and  P,  the  number  of  stages,  can  be  increased  until  a 
prescribed  tolerance  is  met.  Calculating  the  partial  derivatives 
of  C*[s]  with  respect  to  the  T  j ,  as  required  by  the  non-linear 
least  squares  approximation  procedure,  is  reasonably 
straightforward.  The  incorporation  of  this  fully  automatic 
parameter  selection  procedure  into  a  queueing  network  solution 
package  would  be  a  worthwhile  project. 

A  useful  analogy  can  be  drawn  between  describing  a  service 
time  distribution  in  terms  of  its  moments,  and  attempting  to 
approximate  a  function  from  knowledge  of  its  derivatives  at  a 
specific  point.  The  results  of  Chapter  3  demonstrate  that  the 
Laplace  transform  of  a  service  time  distribution  is  the  function 
of  interest  in  modelling.  Each  moment  of  the  distribution  can  be 
obtained  from  the  corresponding  derivative  of  its  Laplace 
transform,  evaluated  at  zero.  Directly  matching  Laplace 
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transform  values  at  various  points  provides  a  superior 
approximation  to  the  function. 


•  Characterizing  Service  Time  Distributions  In  Networks  With 
Several  Customer  Classes 


I  have  argued  that  de 
terms  of  certain  Laplace 
characterization  in  arbit 
might  ask,  however,  wheth 
time  distributions  can  be 
of  customers  in  a  model, 
variation  of  the  distribu 

Customer  classes  are 
high-level  characteristic 
of  particular  I/O  devices 
unrelated  to  the  distribu 
to  a  device.  Further,  my 
traces  from  the  Universit 
indicates  that  the  CPU  bu 
have  a  rather  high  coeffi 
indicate  that,  at  least  f 
service  time  distribution 
customer  classes  are  empl 


scribing  service  time  distributions  in 
transform  values  provides  an  appropriate 
rary  closed  queueing  networks.  One 
er  the  need  for  non-exponential  service 
eliminated  by  including  several  classes 
thereby  decreasing  the  coefficients  of 
tions  that  arise. 

typically  determined  on  the  basis  of 
s  such  as  total  service  time  or  the  use 
.  These  criteria  are  apt  to  be 
tion  of  service  times  on  single  visits 
own  experience  with  full  interrupt 
y  of  Toronto  Computer  Centre  system 
rsts  of  individual  programs  frequently 
cient  of  variation.  These  two  factors 
or  certain  applications,  non-exponential 
s  are  necessary  even  when  multiple 
oyed . 


•  The  Distribution  of  Response  Times  in  More  General  Networks 

The  results  in  Chapter  4  are  considerably  more  general  than 
is  indicated  there.  In  particular,  they  can  be  extended  from 
central  server  networks  to  arbitrary  network  topologies,  and  from 
single  to  multiple  customer  classes.  Each  of  these  extensions, 
though,  relies  upon  using  the  discrete  equivalent  of  the  method 
of  stages  to  represent  the  distribution  of  number  of  cycles 
through  the  network  required  by  the  customer  of  interest. 

First,  consider  the  generalization  to  arbitrary  networks  of 
queues.  The  accuracy  of  the  approximation  technique  depends  only 
upon  the  reqenerative  nature  of  the  system,  i.e.,  on  the  fact 
that  renewal  points  can  be  defined  and  statistical  limit  theorems 
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applied.  This  behaviour  is  characteristic  of  general  networks  as 
well.  Provided  that  a  cycle  can  be  defined,  the  expected 
duration  of  that  cycle  determined,  and  the  distribution  of  number 
of  cycles  expressed  using  the  method  of  stages,  the  approximation 
techniques  of  Chapter  4  can  be  applied. 

Next,  consider  a  system  that  supports  both  batch  and 
interactive  customers.  We  might  well  wish  to  investigate  the 
effect  of  various  batch  multiprogramming  levels  on  the 
distribution  of  response  times  for  interactive  customers.  The 
formal  development  in  Chapter  4  considers  only  systems  with  a 
single  customer  class.  However,  if  the  discrete  equivalent  of 
the  method  of  stages  can  be  used  to  represent  the  distribution  of 
the  number  of  cycles  required  by  interactive  customers,  then  the 
distribution  of  response  times  for  these  customers  can  be 
obtained  easily.  All  that  is  required  is  the  expected  duration 
of  a  single  cycle  through  the  network  for  interactive  customers, 
which  can  be  obtained  from  data  provided  by  any  queueing  network 
solution  technique.  In  fact,  the  analysis  of  the  TSO  system  at 
the  University  of  Toronto  Computer  Centre  provides  an 
illustration  of  this  approach. 

Finally,  consider  this  latter  extension  in  the  case  where  the 
distribution  of  number  of  cycles  for  interactive  customers 
consists  of  several  distinct  modes.  A  difficulty  arises  in 
estimating  the  distribution  of  the  number  of  batch  customers  at  a 
service  center  at  the  arrival  instants  of  interactive  customers. 
While  the  arrival  instant  distribution  is  not  egual  to  the 
equilibrium  distribution,  neither  is  the  arrival  instant 
probability  of  encountering  N  batch  customers  (where  N  is  the 
batch  multiprogramming  level)  equal  to  zero. 

One  reasonable  conjecture  is  that  the  probability  of  an 
arriving  interactive  customer  encountering  n  batch  customers  at  a 
service  center  in  a  system  with  a  batch  multiprogramming  level  of 
N  is  equal  to  the  equilibrium  probability  that  there  are  n  batch 
customers  at  that  service  center  in  a  system  with  a  batch 
multiprogramming  level  of  N+1,  divided  by  the  equilibrium 
probability  that  there  are  N+1  batch  customers  present.  This 
conjecture  follows  from  combining  the  results  discussed  in 
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Chapter  4  for  the  single  server  systems  with  a  fixed  number  of 
customers  and  with  finite  waiting  room.  In  the  notation  of  that 
chapter,  the  conjecture  can  be  stated  as 


A  (n) 
N 


B  (N  +  1) 
N+1 


0  <n<N 


where  A  is  the  distribution  of  the  number  of  batch  customers 
present  at  the  arrival  instants  of  interactive  customers,  B  is 
the  equilibrium  distribution  of  the  number  of  batch  customers 
present  at  the  service  center,  and  the  subscripts  denote  the 
batch  multiprogramming  level. 

Based  cn  comparisons  with  several  simulations,  this 
approximation  appears  to  be  quite  accurate.  Additional  work  is 
required,  though. 


•  Obtaining  the  Distribution  of  Response  Times  Via  Transient 

Analysis  and  Regenerative  Simulation 

Transient  solutions  to  queueing  models  are  often  perceived  as 
either  unobtainable  or  unmanageable  [  Grassmann  1  975  ].  When  such 
solutions  are  possible,  though,  they  provide  a  natural  way  to 
determine  the  distribution  of  response  times  in  the  system. 

A  transient  analysis  of  response  times  involves  several 
steps.  First,  the  states  in  which  the  system  may  be  found  by  an 
arriving  customer  are  enumerated,  and  their  probabilities  are 
determined.  Next,  a  Harkov  chain  describing  the  progress  of  a 
specific  newly-arrived  customer  is  defined.  This  chain  includes 
absorption  states  corresponding  to  the  completion  of  that 
customer.  Finally,  the  linear  differential  equations  describing 
this  Harkov  chain  are  solved.  For  each  initial  state,  the 
probability  that  the  system  has  entered  an  absorption  state  at 
various  times  is  determined.  This  information,  along  with  the 
initial  state  probabilities,  allows  the  distribution  of  response 
times  to  be  determined. 


To  solve  the  large  systems  of  equations  that  result  from  such 
an  analysis,  Grassmann  [1977]  and  DeSantis  [1976]  use  the  method 
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of  randomization.  Neuts  [1977]  applies  more  conventional  Runge- 
Kutta  techniques.  While  Grassmann  and  Neuts  consider  only  single 
service  centers,  DeSantis  has  analyzed  a  simple  queueing  network 
model. 

Iglehart  and  Shedler  [1977]  have  obtained  the  distribution  of 
response  times  in  networks  of  queues  by  using  simulation  in  an 
analoqous  manner.  They  present  several  numerical  examples, 
including  an  analysis  of  resource  contention  in  the  DL/I 
component  of  the  IBM  Information  Management  System.  The  approach 
shows  considerable  promise. 

The  results  obtained  in  Chapter  4  of  this  thesis  indicate 
that  detailed  analyses  such  as  these  may  be  unnecessary  in  many 
applications.  Still,  these  approaches  provide  interesting 
alternatives,  and  a  comparison  with  their  predictions  would  be 
worthwhile . 

Conclusions 


This  thesis  has  addressed  two  of  the  more  significant 
problems  concerning  the  accuracy  and  versatility  of  queueing 
network  models.  Throughout,  the  emphasis  has  been  on  solutions 
with  both  a  mathematical  and  an  intuitive  foundation;  solutions 
of  theoretical  interest  and  practical  impact. 

To  the  extent  that  these  results  find  application  in 
performance  studies  of  production  systems,  the  objectives  of  the 
research  will  have  been  met. 
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